Publications

2025

Tracking Somatic Mutations for Lineage Reconstruction
Neumeier Y., Raz O., Tao L., Marx Z. & Shapiro E. (2025) Lineage Tracing. Gracia-Marques J. & Lee T.(eds.). Vol. 2886. p. 23-45 Abstract
The human genome is composed of distinct genomic regions that are susceptible to various types of somatic mutations. Among these, Short Tandem Repeats (STRs) stand out as the most mutable genetic elements. STRs are short repetitive polymorphic sequences, predominantly situated within noncoding sectors of the genome. The intrinsic repetition characterizing these sequences makes them highly mutable in vivo. Consequently, this characteristic provides the chance to unravel the natural developmental history of human viable cells retrospectively. However, STRs also introduce stutter noise in vitro amplification, which makes their analysis challenging. Here we describe our integrated biochemical-computational platform for single-cell lineage analysis. It consists of a pipeline whose inputs are single cells and whose output is a lineage tree of input cells.

2024

United for change: deliberative coalition formation to change the status quo
Elkind E., Grossi D., Shapiro E. & Talmon N. (2024) Social Choice and Welfare. 63, 3-4, p. 717-746 Abstract
We study a setting in which a community wishes to identify a strongly supported proposal from a space of alternatives, in order to change the status quo. We describe a deliberation process in which agents dynamically form coalitions around proposals that they prefer over the status quo. We formulate conditions on the space of proposals and on the ways in which coalitions are formed that guarantee deliberation to succeed, that is, to terminate by identifying a proposal with the largest possible support. Our results provide theoretical foundations for the analysis of deliberative processes such as the ones that take place in online systems for democratic deliberation support.

2023

Cordial Miners: Fast and Efficient Consensus for Every Eventuality
Keidar I., Naor O., Poupko O. & Shapiro E. (2023) 37th International Symposium on Distributed Computing, DISC 2023. Oshman R.(eds.). Abstract
Cordial Miners are a family of efficient Byzantine Atomic Broadcast protocols, with instances for asynchrony and eventual synchrony. They improve the latency of state-of-the-art DAG-based protocols by almost 2× and achieve optimal good-case complexity of O(n) by forgoing Reliable Broadcast as a building block. Rather, Cordial Miners use the blocklace - a partially-ordered counterpart of the totally-ordered blockchain data structure - to implement the three algorithmic components of consensus: Dissemination, equivocation-exclusion, and ordering.
Brief Announcement: Grassroots Distributed Systems: Concept, Examples, Implementation and Applications
Shapiro E. (2023) 37th International Symposium on Distributed Computing, DISC 2023. Oshman R.(eds.). Abstract
Informally, a distributed system is grassroots if it is permissionless and can have autonomous, independently-deployed instances - geographically and over time - that may interoperate voluntarily once interconnected. More formally, in a grassroots system the set of all correct behaviors of a set of agents P is strictly included in the set of the correct behaviors of P when they are embedded within a larger set of agents P' ⊃ P. Grassroots systems are potentially important as they may allow communities to conduct their social, economic, civic, and political lives in the digital realm solely using their members' networked computing devices (e.g., smartphones), free of third-party control, surveillance, manipulation, coercion, or rent seeking (e.g., by global digital platforms such as Facebook or Bitcoin). Client-server/cloud computing systems are not grassroots, and neither are systems designed to have a single global instance (Bitcoin/Ethereum with hardwired seed miners/bootnodes), and systems that rely on a single global data structure (IPFS, DHTs). An example grassroots system would be a serverless smartphone-based social network supporting multiple independently-budding communities that can merge when a member of one community becomes also a member of another. Here, we formalize the notion of grassroots distributed systems; describe a grassroots dissemination protocol for the model of asynchrony and argue its safety, liveness, and being grassroots; extend the implementation to mobile (address-changing) devices that communicate via an unreliable network (e.g. smartphones using UDP); and discuss how grassroots dissemination can realize grassroots social networking and grassroots cryptocurrencies. The mathematical construction employs distributed multiagent transition systems to define the notions of grassroots protocols, to specify the grassroots dissemination protocols, and to prove their correctness. The protocols use the blocklace - a distributed, partially-ordered counterpart of the replicated, totally-ordered blockchain.
Grassroots Social Networking: Serverless, Permissionless Protocols for Twitter/LinkedIn/WhatsApp
Shapiro E. (2023) Proceedings of the 2023 Workshop on Open Challenges in Online Social Networks, OASIS 2023, Held in conjunction with the 34th ACM conference on Hypertext and Social Media, HT 2023. p. 14-21 Abstract
Offering a viable alternative architecture to centrally-controlled global digital platforms for social networking is an open challenge. Here we present a grassroots architecture for serverless, permissionless, peer-To-peer social networks termed grassroots social networking. The architecture is geared for roaming (address-changing) agents communicating over an unreliable network, e.g., smartphones communicating via UDP. The architecture incorporates (i) a decentralized social graph, where each member controls, maintains and stores only their local neighbourhood in the graph; (ii) member-created feeds, with authors and followers; and (iii) a grassroots dissemination protocol, in which communication occurs only along the edges of the social graph. The architecture realizes these components using the blocklace data structure-a distributed partially-ordered counterpart of the replicated totally-ordered blockchain. We provide two example grassroots social networking protocols-Twitter/LinkedIn-like and WhatsApp-like-and address their security (safety, liveness and privacy), spam/deep-fake resistance, and implementation, demonstrating how centrally-controlled social networks could be supplanted by a grassroots architecture.
No democracy, no academia
Albin E., Bressler S., Rolls A., Schwartz M. & Shapiro E. (2023) Science. 381, 6659, p. 715 Abstract
Proportional ranking in primary elections: A case study
Rosenfeld A., Shapiro E. & Talmon N. (2023) Party Politics. 29, 2, p. 335-346 Abstract
Many democratic political parties hold primary elections, which nicely reflects their democratic nature and promotes, among other things, the democratic value of inclusiveness. However, the methods currently used for holding such primary elections may not be the most suitable, especially if some form of proportional ranking is desired. In this paper, we compare different algorithmic methods for holding primaries (i.e., different aggregation methods for voters ballots) by evaluating the degree of proportional ranking that is achieved by each of them using real-world data. In particular, we compare six different algorithms by analyzing real-world data from a recent primary election conducted by the Israeli Democratit party. Technically, we analyze unique voter data and evaluate the proportionality achieved by means of cluster analysis, aiming at pinpointing the representation that is granted to different voter groups under each of the algorithmic methods considered. Our findings suggest that, contrary to the most-prominent primaries algorithm used (i.e., Approval), other methods such as Sequential Proportional Approval or Phragmen can bring about better proportional ranking and thus may be better suited for primary elections in practice.

2022

Whole-Genome Amplification-Surveying Yield, Reproducibility, and Heterozygous Balance, Reported by STR-Targeting MIPs
Raz O., Tao L., Biezuner T., Marx T., Neumeier Y., Tumanyan N. & Shapiro E. (2022) International Journal of Molecular Sciences. 23, 11, 6161. Abstract
Whole-genome amplification is a crucial first step in nearly all single-cell genomic analyses, with the following steps focused on its products. Bias and variance caused by the whole-genome amplification process add numerous challenges to the world of single-cell genomics. Short tandem repeats are sensitive genomic markers used widely in population genetics, forensics, and retrospective lineage tracing. A previous evaluation of common whole-genome amplification targeting ~1000 non-autosomal short tandem repeat loci is extended here to ~12,000 loci across the entire genome via duplex molecular inversion probes. Other than its improved scale and reduced noise, this system detects an abundance of heterogeneous short tandem repeat loci, allowing the allelic balance to be reported. We show here that while the best overall yield is obtained using RepliG-SC, the maximum uniformity between alleles and reproducibility across cells are maximized by Ampli1, rendering it the best candidate for the comparative heterozygous analysis of single-cell genomes.

2021

Efficient acquisition of tens of thousands of short tandem repeats in single-cell whole-genome-amplified DNA
Tao L., Marx Z., Raz O. & Shapiro E. (2021) STAR Protocols. 2, 4, 100828. Abstract
Short tandem repeats (STRs) are highly abundant in the human genome, but existing approaches for accurate genotyping of STRs are limited. Here, we describe a protocol for duplex molecular inversion probes for high-throughput and cost-effective STR enrichment. We have successfully tested panels targeting as many as 50K STRs in several thousands of genomic samples (e.g., HeLa cells, Du145 cells, leukemia cells, melanoma cells). However, because the protocol is plate based, the sample size is limited to a few thousand. For complete details on the use and execution of this protocol, please refer to Tao et al. (2021).
Building a Sybil-Resilient Digital Community Utilizing Trust-Graph Connectivity
Poupko O., Shahaf G., Shapiro E. & Talmon N. (2021) IEEE/ACM Transactions on Networking. 29, 5, p. 2215-2227 Abstract
Preventing fake or duplicate digital identities (aka sybils) from joining a digital community may be crucial to its survival, especially if it utilizes a consensus protocol among its members or employs democratic governance, where sybils can undermine consensus, tilt decisions, or even take over. Here, we explore the use of a trust-graph of identities, with edges representing trust among identity owners, to allow a community to grow indefinitely without increasing its sybil penetration. Since identities are admitted to the digital community based on their trust by existing digital community members, corrupt identities, which may trust sybils, also pose a threat to the digital community. Sybils and their corrupt perpetrators are together referred to as byzantines, and the overarching aim is to limit their penetration into a digital community. We propose two alternative tools to achieve this goal. One is graph conductance, which works under the assumption that honest people are averse to corrupt ones and tend to distrust them. The second is vertex expansion, which relies on the assumption that there are not too many corrupt identities in the community. Of particular interest is keeping the fraction of byzantines below one third, as it would allow the use of Byzantine Agreement (Lamport et al., 1982) for consensus as well as for sybil-resilient social choice (Shahaf et al., 2019). This paper considers incrementally growing a trust graph and shows that, under its key assumptions and additional requirements, including keeping the conductance or vertex expansion of the community trust graph sufficiently high, a community may grow safely, indefinitely.
In the Beginning There Were n Agents: Founding and Amending a Constitution
Abramowitz B., Shapiro E. & Talmon N. (2021) Algorithmic Decision Theory - 7th International Conference, ADT 2021, Proceedings. Vol. 13023. p. 119-131 Abstract
Consider n agents forming an egalitarian, self-governed community. Their first task is to decide on a decision rule to make further decisions. We start from a rather general initial agreement on the decision-making process based upon a set of intuitive and self-evident axioms, as well as simplifying assumptions about the preferences of the agents. From these humble beginnings we derive a decision rule. Crucially, the decision rule also specifies how it can be changed, or amended, and thus acts as a de facto constitution. Our main contribution is in providing an example of an initial agreement that is simple and intuitive, and a constitution that logically follows from it. The naive agreement is on the basic process of decision making - that agents approve or disapprove proposals; that their vote determines either the acceptance or rejection of each proposal; and on the axioms, which are requirements regarding a constitution that engenders a self-updating decision making process.
Benchmarked approaches for reconstruction of in vitro cell lineages and in silico models of C. elegans and M. musculus developmental trees
Gong W., Granados A. A., Hu J. et_al. (2021) Cell Systems. 12, 8, p. 810-826.e4 Abstract
The recent advent of CRISPR and other molecular tools enabled the reconstruction of cell lineages based on induced DNA mutations and promises to solve the ones of more complex organisms. To date, no lineage reconstruction algorithms have been rigorously examined for their performance and robustness across dataset types and number of cells. To benchmark such methods, we decided to organize a DREAM challenge using in vitro experimental intMEMOIR recordings and in silico data for a C. elegans lineage tree of about 1,000 cells and a Mus musculus tree of 10,000 cells. Some of the 22 approaches submitted had excellent performance, but structural features of the trees prevented optimal reconstructions. Using smaller sub-trees as training sets proved to be a good approach for tuning algorithms to reconstruct larger trees. The simulation and reconstruction methods here generated delineate a potential way forward for solving larger cell lineage trees such as in mouse.
Comparison of seven single cell whole genome amplification commercial kits using targeted sequencing
Biezuner T., Raz O., Amir S., Milo L., Adar R., Fried Y., Ainbinder E. & Shapiro E. (2021) Scientific Reports. 11, 1, 17171. Abstract
Advances in whole genome amplification (WGA) techniques enable understanding of the genomic sequence at a single cell level. Demand for single cell dedicated WGA kits (scWGA) has led to the development of several commercial kit. To this point, no robust comparison of all available kits was performed. Here, we benchmark an economical assay, comparing all commercially available scWGA kits. Our comparison is based on targeted sequencing of thousands of genomic loci, including highly mutable regions, from a large cohort of human single cells. Using this approach we have demonstrated the superiority of Ampli1 in genome coverage and of RepliG in reduced error rate. In summary, we show that no single kit is optimal across all categories, highlighting the need for a dedicated kit selection in accordance with experimental requirements.
Retrospective cell lineage reconstruction in humans by using short tandem repeats
Tao L., Raz O., Marx Z. et_al. (2021) Cell Reports Methods. 1, 3, 100054. Abstract
Cell lineage analysis aims to uncover the developmental history of an organism back to its cell of origin. Recently, novel in vivo methods utilizing genome editing enabled important insights into the cell lineages of animals. In contrast, human cell lineage remains restricted to retrospective approaches, which still lack resolution and cost-efficient solutions. Here, we demonstrate a scalable platform based on short tandem repeats targeted by duplex molecular inversion probes. With this human cell lineage tracing method, we accurately reproduced a known lineage of DU145 cells and reconstructed lineages of healthy and metastatic single cells from a melanoma patient who matched the anatomical reference while adding further refinements. This platform allowed us to faithfully recapitulate lineages of developmental tissue formation in healthy cells. In summary, our lineage discovery platform can profile informative somatic mutations efficiently and provides solid lineage reconstructions even in challenging low-mutation-rate healthy single cells.
Aggregation over metric spaces: Proposing and voting in elections, budgeting, and legislation
Bulteau L., Shahaf G., Shapiro E. & Talmon N. (2021) Journal of Artificial Intelligence Research. 70, p. 1413-1439 Abstract
We present a unifying framework encompassing a plethora of social choice settings. Viewing each social choice setting as voting in a suitable metric space, we offer a general model of social choice over metric spaces, in which-similarly to the spatial model of elections-each voter specifies an ideal element of the metric space. The ideal element acts as a vote, where each voter prefers elements that are closer to her ideal element. But it also acts as a proposal, thus making all participants equal not only as voters but also as proposers. We consider Condorcet aggregation and a continuum of solution concepts, ranging from minimizing the sum of distances to minimizing the maximum distance. We study applications of our abstract model to various social choice settings, including single-winner elections, committee elections, participatory budgeting, and participatory legislation. For each setting, we compare each solution concept to known voting rules and study various properties of the resulting voting rules. Our framework provides expressive aggregation for a broad range of social choice settings while remaining simple for voters; and may enable a unified and integrated implementation for all these settings, as well as unified extensions such as sybil-resiliency, proxy voting, and deliberative decision making.
Democratic Forking: Choosing Sides with Social Choice
Abramowitz B., Elkind E., Grossi D., Shapiro E. & Talmon N. (2021) Algorithmic Decision Theory - 7th International Conference, ADT 2021, Proceedings. Ríos Insua D. & Fotakis D.(eds.). p. 341-356 Abstract
Any community in which membership is voluntary may eventually break apart, or fork. For example, forks may occur in political parties, business partnerships, social groups, and cryptocurrencies. Forking may be the product of informal social processes or the organized action of an aggrieved minority or an oppressive majority. The aim of this paper is to provide a social choice framework in which agents can report preferences not only over a set of alternatives, but also over the possible forks that may occur in the face of disagreement. We study the resulting social choice setting, concentrating on stability issues, preference elicitation and strategy-proofness.

2020

Genuine Personal Identifiers and Mutual Sureties for Sybil-Resilient Community Growth
Shahaf G., Shapiro E. & Talmon N. (2020) Social Informatics - 12th International Conference, SocInfo 2020, Proceedings. Pedreschi D., Giannotti F., Grisolia F., Dignum F., Bontcheva K., Braghieri M. & Aref S.(eds.). Vol. 12467. p. 320-332 Abstract
Observing its lack, we introduce the notion of a genuine personal identifiera globally unique and singular identifier of a personand present a foundation for a decentralized, grassroots, bottom-up process by which every human being may create, own, and protect the privacy of a genuine personal identifier. Our solution is based on a web-of-trust and is designed for a distributed realization; we apply graph-theoretic notions to show that digital communities can grow indefinitely while ensuring bounded sybil penetration.
Digital social contracts: A foundation for an egalitarian and just digital society
Cardelli L., Orgad L., Shahaf G., Shapiro E. & Talmon N. (2020) CEUR Workshop Proceedings. 2781, p. 51-60 Abstract
Almost two centuries ago Pierre-Joseph Proudhon proposed social contracts - voluntary agreements among free people - as a foundation from which an egalitarian and just society can emerge. A digital social contract (DSC) is the novel incarnation of this concept for the digital age: a voluntary agreement between people that is specified, undertaken, and fulfilled in the digital realm. It embodies the notion of \u201ccode-is-law\u201d in its purest form, in that a DSC is a program - code in a social contracts programming language, which specifies the digital actions parties to the social contract may take; and the parties to the contract are entrusted, equally, with the task of ensuring that each party abides by the contract. Parties to a social contract are identified via their public keys, and the one and only type of action a party to a DSC may take is a \u201cdigital speech act\u201d - signing an utterance with her private key and sending it to the other parties to the contract. We present a formal definition of a DSC as agents that communicate asynchronously via digital speech acts, where the output of each agent is the input of all the other agents. We outline an abstract design for a social contracts programming language and hint on their applicability to social networks, sharing-economy, egalitarian currency networks, and democratic community governance.

Arxiv: 2005.06261

2019

Aggregation over Metric Spaces: Proposing and Voting in Elections, Budgeting, and Legislation
Shahaf G., Shapiro E. & Talmon N. (2019) ALGORITHMIC DECISION THEORY (ADT 2019). Venable KB. & Pekec S.(eds.). p. 172-174 Abstract

Arxiv: 1806.06277
Sybil-Resilient Conductance-Based Community Growth
Poupko O., Shahaf G., Shapiro E. & Talmon N. (2019) COMPUTER SCIENCE - THEORY AND APPLICATIONS. Kucherov G. & VanBevern R.(eds.). p. 359-371 (trueLecture Notes in Computer Science). Abstract
Preventing fake or duplicate e-identities (aka sybils) from joining an e-community may be crucial to its survival, especially if it utilizes a consensus protocol among its members or employs democratic governance, where sybils can undermine consensus, tilt decisions, or even take over. Here, we explore the use of a trust graph of identities, with trust edges representing trust among identity owners, to allow a community to grow without increasing its sybil penetration. Since identities are admitted to the e-community based on their trust by existing e-community members, corrupt identities, which may trust sybils, also pose a threat to the e-community. Sybils and their corrupt perpetrators are together referred to as Byzantines, and our overarching aim is to limit their penetration into an e-community. Our key tool in achieving this is graph conductance, and our key assumption is that honest people are averse to corrupt ones and tend to distrust them. Of particular interest is keeping the fraction of Byzantines below one third, as it would allow the use of Byzantine Agreement (see Lamport et al. The Byzantine generals problem, ACM Transactions on Programming Languages and Systems, 4(3):382-401, 1982) for consensus as well as for sybil-resilient social choice (see Shahaf et al., Sybil-resilient reality-aware social choice, arXiv preprint arXiv:1807.11105, 2019). We consider sequences of incrementally growing trust graphs and show that, under our key assumption and additional requirements, including keeping the conductance of the community trust graph sufficiently high, a community may grow safely.

Arxiv: 1901.00752
Genuine Personal Identifiers and Mutual Sureties for Sybil-Resilient Community Formation
Shahaf G., Shapiro E. & Talmon N. (2019) arXiv. Abstract
While most of humanity is suddenly on the net, the value of this singularity is hampered by the lack of credible digital identities: Social networking, person-to-person transactions, democratic conduct, cooperation and philanthropy are all hampered by the profound presence of fake identities, as illustrated by Facebook's removal of 5.4Bn fake accounts since the beginning of 2019. Here, we introduce the fundamental notion of a \emph{genuine personal identifier}---a globally unique and singular identifier of a person---and present a foundation for a decentralized, grassroots, bottom-up process in which every human being may create, own, and protect the privacy of a genuine personal identifier. The solution employs mutual sureties among owners of personal identifiers, resulting in a mutual-surety graph reminiscent of a web-of-trust. Importantly, this approach is designed for a distributed realization, possibly using distributed ledger technology, and does not depend on the use or storage of biometric properties. For the solution to be complete, additional components are needed, notably a mechanism that encourages honest behavior and a sybil-resilient governance system.

Submitted version
Short tandem repeat stutter model inferred from direct measurement of in vitro stutter noise
Raz O., Biezuner T., Spiro A. et_al. (2019) Nucleic Acids Research. 47, 5, p. 2436-2445 Abstract
Short tandem repeats (STRs) are polymorphic genomic loci valuable for various applications such as research, diagnostics and forensics. However, their polymorphic nature also introduces noise duringin vitroamplification, making them difficult to analyze. Although it is possible to overcome stutter noise by using amplification-free library preparation, such protocols are presently incompatible with single cell analysis and with targeted-enrichment protocols. To address this challenge, we have designed a method for direct measurement of in vitro noise. Using a synthetic STR sequencing library, we have calibrated a Markov model for the prediction of stutter patterns at any amplification cycle. By employing this model, we have managed to genotype accurately cases of severe amplification bias, and biallelic STR signals, and validated our model for several high-fidelity PCR enzymes. Finally, we compared this model in the context of a naive STR genotyping strategy against the state-of-the-art on a benchmark of single cells, demonstrating superior accuracy.
Sybil-resilient reality-aware social choice
Shahaf G., Shapiro E. & Talmon N. (2019) Proceedings of the 28th International Joint Conference on Artificial Intelligence, IJCAI 2019. Kraus S.(eds.). p. 572-579 Abstract
Sybil attacks, in which fake or duplicate identities (sybils) infiltrate an online community, pose a serious threat to such communities, as they might tilt community-wide decisions in their favor. While the extensive research on sybil identification may help keep the fraction of sybils in such communities low, it cannot however ensure their complete eradication. Thus, our goal is to enhance social choice theory with effective group decision mechanisms for communities with bounded sybil penetration. Inspired by Reality-Aware Social Choice [Shapiro and Talmon, 2018], we use the status quo as the anchor of sybil resilience, characterized by sybil safety - the inability of sybils to change the status quo against the will of the genuine agents, and sybil liveness - the ability of the genuine agents to change the status quo against the will of the sybils. We consider the social choice settings of deciding on a single proposal, on multiple proposals, and on updating a parameter. For each, we present social choice rules that are sybil-safe and, under certain conditions, satisfy sybil-liveness.

2018

Global Cryptodemocracy Is Possible and Desirable
Shapiro E. (2018) Debating Transformations of National Citizenship. Bauböck R.(eds.). Cham: . p. 343-351 (true IMISCOE Research Series). Abstract
The fascinating discussion kicked-off by Liav Orgad addresses the interplay between the clouds and earth: How do cloud citizens and cloud communities relate to their earthly counterparts? Orgad presented the vision of cloud communities, yet many commentators claim that what happens in the cloud stays in the cloud and that such communities cannot have relevance to earthly matters. I claim that much of this criticism loses its force if we go all the way and aim for a global cloud community of all global citizens. And I present a possible design for such a global democracy of global citizens based on an egalitarian cryptocurrency, which we could call `cryptodemocracy'.
point:: Foundations of Internet-enabled Democracy
Shapiro E. (2018) Communications of the ACM. 61, 8, p. 31-34 Abstract
Considering the possibility of achieving an e-democracy based on long-established foundations that strengthen both real-world democracies and virtual Internet communities.

Arxiv: 1710.02873
Sybil-Resilient Reality-Aware Social Choice
Shahaf G., Shapiro E. & Talmon N. (2018) arXiv. Abstract
Sybil attacks, in which fake or duplicate identities (\emph{sybils}) infiltrate an online community, pose a serious threat to such communities, as they might tilt community-wide decisions in their favor. While the extensive research on sybil identification may help keep the fraction of sybils in such communities low, it cannot however ensure their complete eradication. Thus, our goal is to enhance social choice theory with effective group decision mechanisms for communities with bounded sybil penetration. Inspired by Reality-Aware Social Choice, we use the status quo as the anchor of \emph{sybil resilience}, characterized by \emph{sybil safety} -- the inability of sybils to change the status quo against the will of the genuine agents, and \emph{sybil liveness} -- the ability of the genuine agents to change the status quo against the will of the sybils. We consider the social choice settings of deciding on a single proposal, on multiple proposals, and on updating a parameter. For each, we present social choice rules that are sybil-safe and, under certain conditions, satisfy sybil-liveness.

Submitted version
On the journey from nematode to human, scientists dive by the zebrafish cell lineage tree
Shapiro E. (2018) Genome Biology. 19, 1, 63. Abstract
Three recent single-cell papers use novel CRISPR-Cas9-sgRNA genome editing methods to shed light on the zebrafish cell lineage tree.
A biological-computational human cell lineage discovery platform based on duplex molecular inversion probes
Tao L., Raz O., Marx Z. et_al. (2018) BioRxiv. Abstract
Short Tandem Repeats (STRs) are highly mutable genomic elements composed of repetitive short motifs, widely distributed within the human genome, and as such are the most promising source for somatic genomic variations. We present an affordable and scalable cell lineage reconstruction platform that combines customizable duplex Molecular Inversion Probes (MIPs), high throughput targeted sequencing and tailored analysis, all integrated in a bioinformatics Database Management System (DBMS). By applying this platform to a benchmark of ex vivo lineage samples, we demonstrate efficient acquisition of tens of thousands of targets in single-cell whole-genome amplified DNA and the discovery of lineage relations among these cells with superior accuracy. We then reconstruct a cell lineage tree of healthy and metastatic cells from a melanoma patient, supporting the hypothesis of clonal metastases and demonstrating that a naïve panel targeting STR somatic mutations in single cells can outperform a cancer specific SNP panel in reconstruction accuracy.
Incorporating Reality into Social Choice
Shapiro E. & Talmon N. (2018) Proceedings of the 17th International Conference on Autonomous Agents and MultiAgent Systems. p. 1188-1192 (trueAAMAS '18). Abstract
When voting on a proposal one in fact chooses between two alternatives: (i) A new hypothetical social state depicted by the proposal and (ii) the status quo (henceforth: Reality); a Yes vote favors a transition to the proposed hypothetical state, while a No vote favors Reality. Social Choice theory generalizes voting on one proposal to ranking multiple proposals; that Reality was forsaken during this generalization is, in our view, inexplicable. Here we propose to rectify this neglect and incorporate Reality into Social Choice, distinguishing Reality from hypothesis. We show that doing so: (i) Offers a natural resolution to Condorcet's paradox; (ii) Explains what approval voters approve; (iii) Produces a simple and efficient Condorcet-consistent show-of-hands agenda; (iv) Produces democratic action plans, which start with Reality and proceed in democratically-supported transitions; and (v) Nullifies Independence of Irrelevant Alternatives and hence abdicates Arrow's Theorem. Arrow's theorem was taken to show that democracy, conceived as government by the will of the people, is an incoherent illusion. Incorporating Reality into Social Choice may clear this intellectual blemish on democracy and offer a coherent, simple, efficient, easy to communicate, and trustworthy path forward to democracy.

2017

Science Forum: The Human Cell Atlas
Regev A., Teichmann S. A., Lander E. S. et_al. (2017) eLife. 6, e27041. Abstract
The recent advent of methods for high-throughput single-cell molecular profiling has catalyzed a growing sense in the scientific community that the time is ripe to complete the 150-year-old effort to identify all cell types in the human body. The Human Cell Atlas Project is an international collaborative effort that aims to define all human cell types in terms of distinctive molecular profiles (such as gene expression profiles) and to connect this information with classical cellular descriptions (such as location and morphology). An open comprehensive reference map of the molecular state of cells in healthy human tissues would propel the systematic study of physiological states, developmental trajectories, regulatory circuitry and interactions of cells, and also provide a framework for understanding cellular dysregulation in human disease. Here we describe the idea, its potential utility, early proofs-of-concept, and some design considerations for the Human Cell Atlas, including a commitment to open data, code, and community.
A Participatory Democratic Budgeting Algorithm
Shapiro E. & Talmon N. (2017) arXiv. Abstract
The budget is the key means for effecting policy in democracies, yet its preparation is typically an excluding, opaque, and arcane process. We aim to rectify this by providing for the democratic creation of complete budgets --- for cooperatives, cities, or states. Such budgets are typically (i) prepared, discussed, and voted upon by comparing and contrasting with last-year's budget, (ii) quantitative, in that items appear in quantities with potentially varying costs, and (iii) hierarchical, reflecting the organization's structure. Our process can be used by a budget committee, the legislature or the electorate at large. We allow great flexibility in vote elicitation, from perturbing last-year's budget to a complete ranked budget proposal. We present a polynomial-time algorithm which takes such votes, last-year's budget, and a budget limit as input and produces a budget that is provably "democratically optimal" (Condorcet-consistent), in that no proposed change to it has majority support among the votes.

Arxiv: 1709.05839

2016

A generic, cost-effective, and scalable cell lineage analysis platform
Biezuner T., Spiro A., Raz O. et_al. (2016) Genome Research. 26, 11, p. 1588-1599 Abstract
Advances in single-cell genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells. Current sequencing-based methods for cell lineage analysis depend on low-resolution bulk analysis or rely on extensive single-cell sequencing, which is not scalable and could be biased by functional dependencies. Here we show an integrated biochemical-computational platform for generic single-cell lineage analysis that is retrospective, cost-effective, and scalable. It consists of a biochemical-computational pipeline that inputs individual cells, produces targeted single-cell sequencing data, and uses it to generate a lineage tree of the input cells. We validated the platform by applying it to cells sampled from an ex vivo grown tree and analyzed its feasibility landscape by computer simulations. We conclude that the platform may serve as a generic tool for lineage analysis and thus pave the way toward large-scale human cell lineage discovery.
Erratum to: ESTGt: A programming and simulation environment for population dynamics. [BMC Bioinformatics., 17 (2016): 187]
Spiro A. & Shapiro E. (2016) BMC Bioinformatics. 17, 1, 303. Abstract
Accuracy of Answers to Cell Lineage Questions Depends on Single-Cell Genomics Data Quality and Quantity.
Spiro A. & Shapiro E. (2016) PLoS Computational Biology. 12, 6, 1004983. Abstract
Advances in single-cell (SC) genomics enable commensurate improvements in methods for uncovering lineage relations among individual cells, as determined by phylogenetic analysis of the somatic mutations harbored by each cell. Theoretically, complete and accurate knowledge of the genome of each cell of an individual can produce an extremely accurate cell lineage tree of that individual. However, the reality of SC genomics is that such complete and accurate knowledge would be wanting, in quality and in quantity, for the foreseeable future. In this paper we offer a framework for systematically exploring the feasibility of answering cell lineage questions based on SC somatic mutational analysis, as a function of SC genomics data quality and quantity. We take into consideration the current limitations of SC genomics in terms of mutation data quality, most notably amplification bias and allele dropouts (ADO), as well as cost, which puts practical limits on mutation data quantity obtained from each cell as well as on cell sample density. We do so by generating in silico cell lineage trees using a dedicated formal language, eSTG, and show how the ability to answer correctly a cell lineage question depends on the quality and quantity of the SC mutation data. The presented framework can serve as a baseline for the potential of current SC genomics to unravel cell lineage dynamics, as well as the potential contributions of future advancement, both biochemical and computational, for the task.
eSTGt: A programming and simulation environment for population dynamics
Spiro A. & Shapiro E. (2016) BMC Bioinformatics. 17, 1, 187. Abstract
Background: We have previously presented a formal language for describing population dynamics based on environment-dependent Stochastic Tree Grammars (eSTG). The language captures in broad terms the effect of the changing environment while abstracting away details on interaction among individuals. An eSTG program consists of a set of stochastic tree grammar transition rules that are context-free. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count and elapsed time. In addition, each individual may have an internal state, which can change during transitions. Results: This paper presents eSTGt (eSTG tool), an eSTG programming and simulation environment. When executing a program, the tool generates the corresponding lineage trees as well as the internal states values, which can then be analyzed either through the tool's GUI or using MATLAB's command-line environment. Conclusions: The presented tool allows researchers to use existing biological knowledge in order to model the dynamics of a developmental process and analyze its behavior throughout the historical events. Simulated lineage trees can be used to validate various hypotheses in silico and to predict the behavior of dynamical systems under various conditions. Written under MATLAB environment, the tool also enables to easily integrate the output data within the user's downstream analysis.
Synthesis and cell-free cloning of DNA libraries using programmable microfluidics
Ben-Yehezkel T., Rival A., Raz O. et_al. (2016) Nucleic Acids Research. 44, 4, e35. Abstract
Microfluidics may revolutionize our ability to write synthetic DNA by addressing several fundamental limitations associated with generating novel genetic constructs. Here we report the first de novo synthesis and cell-free cloning of custom DNA libraries in sub-microliter reaction droplets using programmable digital microfluidics. Specifically, we developed Programmable Order Polymerization (POP), Microfluidic Combinatorial Assembly of DNA (M-CAD) and Microfluidic In-vitro Cloning (MIC) and applied them to de novo synthesis, combinatorial assembly and cell-free cloning of genes, respectively. Proof-of-concept for these methods was demonstrated by programming an autonomous microfluidic system to construct and clone libraries of yeast ribosome binding sites and bacterial Azurine, which were then retrieved in individual droplets and validated. The ability to rapidly and robustly generate designer DNA molecules in an autonomous manner should have wide application in biological research and development.

2015

Rationally designed, heterologous s. cerevisiae transcripts expose novel expression determinants
Ben-Yehezkel T., Atar S., Zur H. et_al. (2015) RNA Biology. 12, 9, p. 972-984 Abstract
Deducing generic causal relations between RNA transcript features and protein expression profiles from endogenous gene expression data remains a major unsolved problem in biology. The analysis of gene expression from heterologous genes contributes significantly to solving this problem, but has been heavily biased toward the study of the effect of 5 transcript regions and to prokaryotes. Here, we employ a synthetic biology driven approach that systematically differentiates the effect of different regions of the transcript on gene expression up to 240 nucleotides into the ORF. This enabled us to discover new causal effects between features in previously unexplored regions of transcripts, and gene expression in natural regimes. We rationally designed, constructed, and analyzed 383 gene variants of the viral HRSVgp04 gene ORF, with multiple synonymous mutations at key positions along the transcript in the eukaryote S. cerevisiae. Our results show that a few silent mutations at the 5UTR can have a dramatic effect of up to 15 fold change on protein levels, and that even synonymous mutations in positions more than 120 nucleotides downstream from the ORF 5end can modulate protein levels up to 160%-300%. We demonstrate that the correlation between protein levels and folding energy increases with the significance of the level of selection of the latter in endogenous genes, reinforcing the notion that selection for folding strength in different parts of the ORF is related to translation regulation. Our measured protein abundance correlates notably(correlation up to r = 0.62 (p=0.0013)) with mean relative codon decoding times, based on ribosomal densities (Ribo-Seq) in endogenous genes, supporting the conjecture that translation elongation and adaptation to the tRNA pool can modify protein levels in a causal/direct manner. This report provides an improved understanding of transcript evolution, design principles of gene expression regulation, and suggests simple rules for engineering synthetic gene expression in eukaryotes.

2014

Heuristic for Maximizing DNA Reuse in Synthetic DNA Library Assembly
Blakes J., Raz O., Feige U., Bacardit J., Widera P., Ben-Yehezkel T., Shapiro E. & Krasnogor N. (2014) ACS Synthetic Biology. 3, 8, p. 529-542 Abstract
De novo DNA synthesis is in need of new ideas for increasing production rate and reducing cost. DNA reuse in combinatorial library construction is one such idea. Here, we describe an algorithm for planning multistage assembly of DNA libraries with shared intermediates that greedily attempts to maximize DNA reuse, and show both theoretically and empirically that it runs in linear time. We compare solution quality and algorithmic performance to the best results reported for computing DNA assembly graphs, finding that our algorithm achieves solutions of equivalent quality but with dramatically shorter running times and substantially improved scalability. We also show that the related computational problem bounded-depth min-cost string production (BDMSP), which captures DNA library assembly operations with a simplified cost model, is NP-hard and APX-hard by reduction from vertex cover. The algorithm presented here provides solutions of near-minimal stages and thanks to almost instantaneous planning of DNA libraries it can be used as a metric of "manufacturability" to guide DNA library design. Rapid planning remains applicable even for DNA library sizes vastly exceeding today's biochemical assembly methods, future-proofing our method.
Lineage grammars: Describing, simulating and analyzing population dynamics
Spiro A., Cardelli L. & Shapiro E. (2014) BMC Bioinformatics. 15, 1, 249. Abstract
Background: Precise description of the dynamics of biological processes would enable the mathematical analysis and computational simulation of complex biological phenomena. Languages such as Chemical Reaction Networks and Process Algebras cater for the detailed description of interactions among individuals and for the simulation and analysis of ensuing behaviors of populations. However, often knowledge of such interactions is lacking or not available. Yet complete oblivion to the environment would make the description of any biological process vacuous. Here we present a language for describing population dynamics that abstracts away detailed interaction among individuals, yet captures in broad terms the effect of the changing environment, based on environment-dependent Stochastic Tree Grammars (eSTG). It is comprised of a set of stochastic tree grammar transition rules, which are context-free and as such abstract away specific interactions among individuals. Transition rule probabilities and rates, however, can depend on global parameters such as population size, generation count, and elapsed time. Results: We show that eSTGs conveniently describe population dynamics at multiple levels including cellular dynamics, tissue development and niches of organisms. Notably, we show the utilization of eSTG for cases in which the dynamics is regulated by environmental factors, which affect the fate and rate of decisions of the different species. eSTGs are lineage grammars, in the sense that execution of an eSTG program generates the corresponding lineage trees, which can be used to analyze the evolutionary and developmental history of the biological system under investigation. These lineage trees contain a representation of the entire events history of the system, including the dynamics that led to the existing as well as to the extinct individuals. Conclusions: We conclude that our suggested formalism can be used to easily specify, simulate and analyze complex biological systems, and supports modular description of local biological dynamics that can be later used as \u201cblack boxes\u201d in a larger scope, thus enabling a gradual and hierarchical definition and simulation of complex biological systems. The simple, yet robust formalism enables to target a broad class of stochastic dynamic behaviors, especially those that can be modeled using global environmental feedback regulation rather than direct interaction between individuals.
Accurate, Model-Based Tuning of Synthetic Gene Expression Using Introns in S. cerevisiae
Yofe I., Zafrir Z., Blau R., Schuldiner M., Tuller T., Shapiro E. & Ben-Yehezkel T. (2014) PLoS Genetics. 10, 6, e1004407. Abstract
Introns are key regulators of eukaryotic gene expression and present a potentially powerful tool for the design of synthetic eukaryotic gene expression systems. However, intronic control over gene expression is governed by a multitude of complex, incompletely understood, regulatory mechanisms. Despite this lack of detailed mechanistic understanding, here we show how a relatively simple model enables accurate and predictable tuning of synthetic gene expression system in yeast using several predictive intron features such as transcript folding and sequence motifs. Using only natural Saccharomyces cerevisiae introns as regulators, we demonstrate fine and accurate control over gene expression spanning a 100 fold expression range. These results broaden the engineering toolbox of synthetic gene expression systems and provide a framework in which precise and robust tuning of gene expression is accomplished.
Single cell analysis exposes intratumor heterogeneity and suggests that FLT3-ITD is a late event in leukemogenesis
Shouval R., Shlush L. I., Yehudai-Resheff S., Ali S., Pery N., Shapiro E., Tzukerman M., Rowe J. M. & Zuckerman T. (2014) Experimental Hematology. 42, 6, p. 457-463 Abstract
FMS-like tyrosine kinase 3 receptor internal tandem duplication (FLT3-ITD) commonly occurs in acute myeloid leukemia and is considered rare in acute lymphocytic leukemia. Acute leukemia has poor prognosis, mainly due to relapse. Standard FLT3-ITD diagnostic techniques are based on genomic polymerase chain reaction and have recently incorporated GeneScan (Applied Biosystems, Foster City, CA) to identify variations of the FLT3 gene. As this is an average-based assay utilized in a heterogeneous leukemic cell population, we hypothesized that cells of acute leukemia, considered FLT3-ITD-negative by standard methods, could possess a fraction of FLT3-ITD-positive cells. The present study employed single cell mutation analysis to evaluate the FLT3-ITD status in newly diagnosed acute myeloid leukemia (n = 5) and acute lymphocytic leukemia (n = 3) patients. A total of 541 single leukemic cells and 36 mononuclear cells from healthy volunteers were analyzed. Seven patients, considered FLT3-ITD-negative according to bulk DNA analysis, appeared to possess a small fraction of FLT3-ITD-positive cells based on single cell analysis. Moreover, this approach revealed the heterogeneity of the tumor as evident by different FLT3-ITD mutations present in the same patient. The presence of a minor clone carrying FLT3-ITD in almost all patients tested provides evidence that this lesion is a common late event in leukemogenesis. Additionally, 3 relapsed patients demonstrated loss of heterozygosity of the normal allele, affecting 25 %-100% of the cells found to be FLT3-ITD-positive. Though further clinical testing is warranted, these findings may have implications on the prognostic significance of FLT3-ITD and the use of targeted therapy.
Point of View: Correcting the bias against interdisciplinary research
Shapiro E. (2014) eLife. 3, 3, 02576. Abstract
When making decisions about funding and jobs the scientific community should recognise that most of the tools used to evaluate scientific excellence are biased in favour of established disciplines and against interdisciplinary research.

2013

A synthetic biology approach for evaluating the functional contribution of designer cellulosome components to deconstruction of cellulosic substrates
Vazana Y., Barak Y., Unger T. et_al. (2013) Biotechnology for Biofuels. 6, 1, 182. Abstract
Background: Select cellulolytic bacteria produce multi-enzymatic cellulosome complexes that bind to the plant cell wall and catalyze its efficient degradation. The multi-modular interconnecting cellulosomal subunits comprise dockerin-containing enzymes that bind cohesively to cohesin-containing scaffoldins. The organization of the modules into functional polypeptides is achieved by intermodular linkers of different lengths and composition, which provide flexibility to the complex and determine its overall architecture. Results: Using a synthetic biology approach, we systematically investigated the spatial organization of the scaffoldin subunit and its effect on cellulose hydrolysis by designing a combinatorial library of recombinant trivalent designer scaffoldins, which contain a carbohydrate-binding module (CBM) and 3 divergent cohesin modules. The positions of the individual modules were shuffled into 24 different arrangements of chimaeric scaffoldins. This basic set was further extended into three sub-sets for each arrangement with intermodular linkers ranging from zero (no linkers), 5 (short linkers) and native linkers of 27-35 amino acids (long linkers). Of the 72 possible scaffoldins, 56 were successfully cloned and 45 of them expressed, representing 14 full sets of chimaeric scaffoldins. The resultant 42-component scaffoldin library was used to assemble designer cellulosomes, comprising three model C. thermocellum cellulases. Activities were examined using Avicel as a pure microcrystalline cellulose substrate and pretreated cellulose-enriched wheat straw as a model substrate derived from a native source. All scaffoldin combinations yielded active trivalent designer cellulosome assemblies on both substrates that exceeded the levels of the free enzyme systems. A preferred modular arrangement for the trivalent designer scaffoldin was not observed for the three enzymes used in this study, indicating that they could be integrated at any position in the designer cellulosome without significant effect on cellulose-degrading activity. Designer cellulosomes assembled with the long-linker scaffoldins achieved higher levels of activity, compared to those assembled with short-and no-linker scaffoldins. Conclusions The results demonstrate the robustness of the cellulosome system. Long intermodular scaffoldin linkers are preferable, thus leading to enhanced degradation of cellulosic substrates, presumably due to the increased flexibility and spatial positioning of the attached enzymes in the complex. These findings provide a general basis for improved designer cellulosome systems as a platform for bioethanol production.
Comparing Algorithms That Reconstruct Cell Lineage Trees Utilizing Information on Microsatellite Mutations
Chapal Ilani I. N., Maruvka Y. E., Spiro A., Reizel Y., Adar R., Shlush L. I. & Shapiro E. (2013) PLoS Computational Biology. 9, 11, 1003297. Abstract
Organism cells proliferate and die to build, maintain, renew and repair it. The cellular history of an organism up to any point in time can be captured by a cell lineage tree in which vertices represent all organism cells, past and present, and directed edges represent progeny relations among them. The root represents the fertilized egg, and the leaves represent extant and dead cells. Somatic mutations accumulated during cell division endow each organism cell with a genomic signature that is unique with a very high probability. Distances between such genomic signatures can be used to reconstruct an organism's cell lineage tree. Cell populations possess unique features that are absent or rare in organism populations (e. g., the presence of stem cells and a small number of generations since the zygote) and do not undergo sexual reproduction, hence the reconstruction of cell lineage trees calls for careful examination and adaptation of the standard tools of population genetics. Our lab developed a method for reconstructing cell lineage trees by examining only mutations in highly variable microsatellite loci (MS, also called short tandem repeats, STR). In this study we use experimental data on somatic mutations in MS of individual cells in human and mice in order to validate and quantify the utility of known lineage tree reconstruction algorithms in this context. We employed extensive measurements of somatic mutations in individual cells which were isolated from healthy and diseased tissues of mice and humans. The validation was done by analyzing the ability to infer known and clear biological scenarios. In general, we found that if the biological scenario is simple, almost all algorithms tested can infer it. Another somewhat surprising conclusion is that the best algorithm among those tested is Neighbor Joining where the distance measure used is normalized absolute distance. We include our full dataset in Tables S1, S2, S3, S4, S5 to enable further analysis of this data by others.
Mapping the translation initiation landscape of an S. cerevisiae gene using fluorescent proteins
Ben-Yehezkel T., Zur H., Marx T., Shapiro E. & Tuller T. (2013) Genomics. 102, 4, p. 419-429 Abstract
Accurate and efficient gene expression requires that protein translation initiates from mRNA transcripts with high fidelity. At the same time, indiscriminate initiation of translation from multiple ATG start-sites per transcript has been demonstrated, raising fundamental questions regarding the rate and rationale governing alternative translation initiation. We devised a sensitive fluorescent reporter assay for monitoring alternative translation initiation. To demonstrate it, we map the translation initiation landscape of a Saccharomyces cerevisiae gene (RMD1) with a typical ATG sequence context profile. We found that up to 3%-5% of translation initiation events occur from alternative out-of-frame start codons downstream of the main ATG. Initiation from these codons follows the ribosome scanning model: initiation rates from different start sites are determined by ATG order, rather than their context strength. Genomic analysis of S. cerevisiae further supports the scanning model: ATG codons downstream rather than upstream of the main ATG tend to have higher context scores.
DNA computing Molecules reach consensus: Molecules reach consensus
Shapiro E. & Ran T. (2013) Nature Nanotechnology. 8, 10, p. 703-705 Abstract
DNA molecules can be programmed to execute any dynamic process of chemical kinetics and can implement an algorithm for achieving consensus between multiple agents.
Single-cell sequencing-based technologies will revolutionize whole-organism science
Shapiro E., Biezuner T. & Linnarsson S. (2013) Nature Reviews Genetics. 14, 9, p. 618-630 Abstract
The unabated progress in next-generation sequencing technologies is fostering a wave of new genomics, epigenomics, transcriptomics and proteomics technologies. These sequencing-based technologies are increasingly being targeted to individual cells, which will allow many new and longstanding questions to be addressed. For example, single-cell genomics will help to uncover cell lineage relationships; single-cell transcriptomics will supplant the coarse notion of marker-based cell types; and single-cell epigenomics and proteomics will allow the functional states of individual cells to be analysed. These technologies will become integrated within a decade or so, enabling high-throughput, multi-dimensional analyses of individual cells that will produce detailed knowledge of the cell lineage trees of higher organisms, including humans. Such studies will have important implications for both basic biological research and medicine.
Stem cell clonality - Theoretical concepts, experimental techniques, and clinical challenges
Glauche I., Bystrykh L., Eaves C. et_al. (2013) Blood Cells, Molecules, and Diseases. 50, 4, p. 232-240 Abstract
Here we report highlights of discussions and results presented at an International Workshop on Concepts and Models of Stem Cell Organization held on July 16th and 17th, 2012 in Dresden, Germany. The goal of the workshop was to undertake a systematic survey of state-of-the-art methods and results of clonality studies of tissue regeneration and maintenance with a particular emphasis on the hematopoietic system. The meeting was the 6th in a series of similar conceptual workshops, termed StemCellMathLab,. ²2Previous StemCellMathLabs were held in 2001, 2005, 2007 (Leipzig, Germany), 2008 [1] (London, UK), and 2010 [2] (Dresden, Germany). all of which have had the general objective of using an interdisciplinary approach to discuss specific aspects of stem cell biology. The StemCellMathLab 2012, which was jointly organized by the Institute for Medical Informatics and Biometry, Medical Faculty Carl Gustav Carus, Dresden University of Technology and the Institute for Medical Informatics, Statistics and Epidemiology, Medical Faculty, University of Leipzig, brought together 32 scientists from 8 countries, with scientific backgrounds in medicine, cell biology, virology, physics, computer sciences, bioinformatics and mathematics. The workshop focused on the following questions: (1) How heterogeneous are stem cells and their progeny? and (2) What are the characteristic differences in the clonal dynamics between physiological and pathophysiological situations? In discussing these questions, particular emphasis was placed on (a) the methods for quantifying clones and their dynamics in experimental and clinical settings and (b) general concepts and models for their description. In this workshop summary we start with an introduction to the current state of clonality research and a proposal for clearly defined terminology. Major topics of discussion include clonal heterogeneity in unperturbed tissues, clonal dynamics due to physiological and pathophysiological pressures and conceptual and technical issues of clone quantification. We conclude that an interactive cross-disciplinary approach to research in this field will continue to promote a conceptual understanding of tissue organization.
A library of programmable DNAzymes that operate in a cellular environment
Kahan-Hanum M., Douek Y., Adar R. & Shapiro E. (2013) Scientific Reports. 3, 1535. Abstract
DNAzymes were used as inhibitory agents in a variety of experimental disease settings, such as cancer, viral infections and even HIV. Drugs that become active only upon the presence of preprogrammed abnormal environmental conditions may enable selective molecular therapy by targeting abnormal cells without injuring normal cells. Here we show a novel programmable DNAzyme library composed of variety of Boolean logic gates, including YES, AND, NOT, OR, NAND, ANDNOT, XOR, NOR and 3-input-AND gate, that uses both miRNAs and mRNAs as inputs. Each gate is based on the c-jun cleaving Dz13 DNAzyme and active only in the presence of specific input combinations. The library is modular, supports arbitrary inputs and outputs, cascadable, highly specific and robust. We demonstrate the library's potential diagnostic abilities on miRNA and mRNA combinations in cell lysate and its ability to operate in a cellular environment by using beacon-like c-jun mimicking substrate in living mammalian cells.
A new algorithm for combinatorial dna library assembly
Blakes J., Raz O., Krasnogor N. & Shapiro E. (2013) 5th International Workshop on Bio-Design Automation (IWBDA 2013). Abstract
We describe a novel greedy algorithm for computing near-optimal DNA assembly graphs and show empirically that it runs in linear time, enabling almost instantaneous planning of DNA library sizes exceeding the capacity of todays biochemical assembly methods. We compare assembly graph quality and algorithmic performance to the results obtained in [1], demonstrating that they are significantly faster to obtain and equivalent to the best results for DNA library assembly with intermediate reuse found in the literature.

2012

Correction: Programmable In Vivo Selection of Arbitrary DNA Sequences
Ben Yehezkel T., Biezuner T., Linshiz G., Mazor Y. & Shapiro E. (2012) PLoS ONE. 7, 11, Abstract
The Author Contributions are incorrect. They should be: Conceived the experiments: TBY GL. Designed the experiments: TB TBY GL. Performed the experiments: TB GL TBY. Analyzed the data: TB TBY GL YM. Wrote the paper: TBY TB ES.
Programmable In Vivo Selection of Arbitrary DNA Sequences
Ben-Yehezkel T., Biezuner T., Linshiz G., Mazor Y. & Shapiro E. (2012) PLoS ONE. 7, 11, e47795. Abstract
The extraordinary fidelity, sensory and regulatory capacity of natural intracellular machinery is generally confined to their endogenous environment. Nevertheless, synthetic bio-molecular components have been engineered to interface with the cellular transcription, splicing and translation machinery in vivo by embedding functional features such as promoters, introns and ribosome binding sites, respectively, into their design. Tapping and directing the power of intracellular molecular processing towards synthetic bio-molecular inputs is potentially a powerful approach, albeit limited by our ability to streamline the interface of synthetic components with the intracellular machinery in vivo. Here we show how a library of synthetic DNA devices, each bearing an input DNA sequence and a logical selection module, can be designed to direct its own probing and processing by interfacing with the bacterial DNA mismatch repair (MMR) system in vivo and selecting for the most abundant variant, regardless of its function. The device provides proof of concept for programmable, function-independent DNA selection in vivo and provides a unique example of a logical-functional interface of an engineered synthetic component with a complex endogenous cellular system. Further research into the design, construction and operation of synthetic devices in vivo may lead to other functional devices that interface with other complex cellular processes for both research and applied purposes.
A programmable NOR-based device for transcription profile analysis
Ran T., Douek Y., Milo L. & Shapiro E. (2012) Scientific Reports. 2, 641. Abstract
An autonomous synthetic programmable device that can diagnose a cell's state according to predefined markers and produce a corresponding therapeutic output may be the basis of future programmable drugs. Motivated to increase diagnosis precision, devices that integrate multiple disease markers have been implemented based on various molecular tools. As simplicity is key to future in-vivo applications, we sought a molecular device that a) integrates multiple inputs without requiring pairwise interactions, and b) harnesses only mechanisms that cells natively use. Here we show a synthetic NOR-based programmable device, operating via a biochemical obstructing approach rather than on a constructive approach, capable of differentiating between prokaryotic cell strains based on their unique expression profile. To demonstrate our system's strengths we further implemented the NOT, OR and AND gates. The device's programmability allows context-dependent selection of the inputs being sensed, and of the expressed output, thus, holding great promise in future biomedical applications.
Cell lineage analysis of acute leukemia relapse uncovers the role of replication-rate heterogeneity and microsatellite instability
Shlush L. I., Chapal Ilani I. N., Adar R. et_al. (2012) Blood. 120, 3, p. 603-612 Abstract
Human cancers display substantial intratumoral genetic heterogeneity, which facilitates tumor survival under changing microenvironmental conditions. Tumor substructure and its effect on disease progression and relapse are incompletely understood. In the present study, a high-throughput method that uses neutral somatic mutations accumulated in individual cells to reconstruct cell lineage trees was applied to hundreds of cells of human acute leukemia harvested from multiple patients at diagnosis and at relapse. The reconstructed cell lineage trees of patients with acute myeloid leukemia showed that leukemia cells at relapse were shallow (divide rarely) compared with cells at diagnosis and were closely related to their stem cell subpopulation, implying that in these instances relapse might have originated from rarely dividing stem cells. In contrast, among patients with acute lymphoid leukemia, no differences in cell depth were observed between diagnosis and relapse. In one case of chronic myeloid leukemia, at blast crisis, most of the cells at relapse were mismatch-repair deficient. In almost all leukemia cases, > 1 lineage was observed at relapse, indicating that diverse mechanisms can promote relapse in the same patient. In conclusion, diverse relapse mechanisms can be observed by systematic reconstruction of cell lineage trees of patients with leukemia.
Identification of CTCF as a master regulator of the clustered protocadherin genes
Golan-Mashiach M., Grunspan M., Emmanuel R., Gibbs-Bar L., Dikstein R. & Shapiro E. (2012) Nucleic Acids Research. 40, 8, p. 3378-3391 Abstract
The brain is a large and complex network of neurons. Specific neuronal connectivity is thought to be based on the combinatorial expression of the 52 protocadherins (Pcdh) membrane adhesion proteins, whereby each neuron expresses only a specific subset. Pcdh genes are arranged in tandem, in a cluster of three families: Pcdhα, Pcdhβ and Pcdhγ. The expression of each Pcdh gene is regulated by a promoter that has a regulatory conserved sequence element (CSE), common to all 52 genes. The mechanism and factors controlling individual Pcdh gene expression are currently unknown. Here we show that the promoter of each Pcdh gene contains a gene-specific conserved control region, termed specific sequence element (SSE), located adjacent and upstream to the CSE and activates transcription together with the CSE. We purified the complex that specifically binds the SSE-CSE region and identified the CCTC binding-factor (CTCF) as a key molecule that binds and activates Pcdh promoters. Our findings point to CTCF as a factor essential for Pcdh expression and probably governing neuronal connectivity.
A mechanical turing machine: Blueprint for a biomolecular computer
Shapiro E. (2012) Interface Focus. 2, 4, p. 497-503 Abstract
We describe a working mechanical device that embodies the theoretical computing machine of Alan Turing, and as such is a universal programmable computer. The device operates on three-dimensional building blocks by applying mechanical analogues of polymer elongation, cleavage and ligation, movement along a polymer, and control by molecular recognition unleashing allosteric conformational changes. Logically, the device is not more complicated than biomolecular machines of the living cell, and all its operations are part of the standard repertoire of these machines; hence, a biomolecular embodiment of the device is not infeasible. If implemented, such a biomolecular device may operate in vivo, interacting with its biochemical environment in a program-controlled manner. In particular, it may 'compute' synthetic biopolymers and release them into its environment in response to input from the environment, a capability that may have broad pharmaceutical and biological applications.
Cell lineage analysis of the mammalian female germline
Reizel Y., Itzkovitz S., Adar R. et_al. (2012) PLoS Genetics. 8, 2, e1002477. Abstract
Fundamental aspects of embryonic and post-natal development, including maintenance of the mammalian female germline, are largely unknown. Here we employ a retrospective, phylogenetic-based method for reconstructing cell lineage trees utilizing somatic mutations accumulated in microsatellites, to study female germline dynamics in mice. Reconstructed cell lineage trees can be used to estimate lineage relationships between different cell types, as well as cell depth (number of cell divisions since the zygote). We show that, in the reconstructed mouse cell lineage trees, oocytes form clusters that are separate from hematopoietic and mesenchymal stem cells, both in young and old mice, indicating that these populations belong to distinct lineages. Furthermore, while cumulus cells sampled from different ovarian follicles are distinctly clustered on the reconstructed trees, oocytes from the left and right ovaries are not, suggesting a mixing of their progenitor pools. We also observed an increase in oocyte depth with mouse age, which can be explained either by depth-guided selection of oocytes for ovulation or by post-natal renewal. Overall, our study sheds light on substantial novel aspects of female germline preservation and development.
Recursive construction of perfect DNA molecules and libraries from imperfect oligonucleotides
Linshiz G., Yehezkel T. B. & Shapiro E. (2012) Gene Synthesis. Peccoud J.(eds.). p. 151-163 (trueMethods in Molecular Biology). Abstract
Making faultless complex objects from potentially faulty building blocks is a fundamental challenge in computer engineering, nanotechnology, and synthetic biology. We developed an error-correcting recursive construction procedure that attempts to address this challenge. Making DNA molecules from synthetic oligonucleotides using the procedure described here surpasses existing methods for de novo DNA synthesis in speed, precision, and amenability to automation. It provides for the first time a unified DNA construction platform for combining synthetic and natural DNA fragments, for constructing designer DNA libraries, and for making the faultless long synthetic DNA building blocks needed for de novo genome construction.
De novo DNA synthesis using single-molecule PCR.
Yehezkel T. B., Linshiz G. & Shapiro E. (2012) Methods in molecular biology (Clifton, N.J.). 852, p. 35-47 Abstract
The throughput of DNA reading (i.e., sequencing) has dramatically increased recently owing to the incorporation of in vitro clonal amplification. The throughput of DNA writing (i.e., synthesis) is trailing behind, with cloning and sequencing constituting the main bottleneck. To overcome this bottleneck, an in vitro alternative for in vivo DNA cloning needs to be integrated into DNA synthesis methods. Here, we show how a new single-molecule PCR (smPCR)-based procedure can be employed as a general substitute for in vivo cloning, thereby allowing for the first time in vitro DNA synthesis. We integrated this rapid and high fidelity in vitro procedure into our previously described recursive DNA synthesis and error correction procedure and used it to efficiently construct and error-correct a 1.8-kb DNA molecule from synthetic unpurified oligonucleotides, entirely in vitro. Although we demonstrate incorporating smPCR in a particular method, the approach is general and can be used, in principle, in conjunction with other DNA synthesis methods as well.

2011

Muscle-bound primordial stem cells give rise to myofiber-associated myogenic and non-myogenic progenitors
Segev E., Shefer G., Adar R., Chapal Ilani I. N., Itzkovitz S., Horovitz I., Reizel Y., Benayahu D. & Shapiro E. (2011) PLoS ONE. 6, 10, e25605. Abstract
Myofiber cultures give rise to myogenic as well as to non-myogenic cells. Whether these myofiber-associated non-myogenic cells develop from resident stem cells that possess mesenchymal plasticity or from other stem cells such as mesenchymal stem cells (MSCs) remain unsolved. To address this question, we applied a method for reconstructing cell lineage trees from somatic mutations to MSCs and myogenic and non-myogenic cells from individual myofibers that were cultured at clonal density. Our analyses show that (i) in addition to myogenic progenitors, myofibers also harbor non-myogenic progenitors of a distinct, yet close, lineage; (ii) myofiber-associated non-myogenic and myogenic cells share the same muscle-bound primordial stem cells of a lineage distinct from bone marrow MSCs; (iii) these muscle-bound primordial stem-cells first part to individual muscles and then differentiate into myogenic and non-myogenic stem cells.
Colon stem cell and crypt dynamics exposed by cell lineage reconstruction
Reizel Y., Chapal Ilani I. N., Adar R. et_al. (2011) PLoS Genetics. 7, 7, e1002192. Abstract
Stem cell dynamics in vivo are often being studied by lineage tracing methods. Our laboratory has previously developed a retrospective method for reconstructing cell lineage trees from somatic mutations accumulated in microsatellites. This method was applied here to explore different aspects of stem cell dynamics in the mouse colon without the use of stem cell markers. We first demonstrated the reliability of our method for the study of stem cells by confirming previously established facts, and then we addressed open questions. Our findings confirmed that colon crypts are monoclonal and that, throughout adulthood, the process of monoclonal conversion plays a major role in the maintenance of crypts. The absence of immortal strand mechanism in crypts stem cells was validated by the age-dependent accumulation of microsatellite mutations. In addition, we confirmed the positive correlation between physical and lineage proximity of crypts, by showing that the colon is separated into small domains that share a common ancestor. We gained new data demonstrating that colon epithelium is clustered separately from hematopoietic and other cell types, indicating that the colon is constituted of few progenitors and ruling out significant renewal of colonic epithelium from hematopoietic cells during adulthood. Overall, our study demonstrates the reliability of cell lineage reconstruction for the study of stem cell dynamics, and it further addresses open questions in colon stem cells. In addition, this method can be applied to study stem cell dynamics in other systems.
Detection of multiple disease indicators by an autonomous biomolecular computer
Gil B., Kahan-Hanum M., Skirtenko N., Adar R. & Shapiro E. (2011) Nano Letters. 11, 7, p. 2989-2996 Abstract
The promise of biomolecular computers is their ability to interact with naturally occurring biomolecules, enabling in the future the development of context-dependent programmable drugs. Here we show a context-sensing mechanism of a biomolecular automaton that can simultaneously sense different types of molecules, allowing future integration of biomedical knowledge on a broad range of molecular disease symptoms in the decision of a biomolecular computer to release a drug molecule.
Computer-aided high-throughput cloning of bacteria in liquid medium
Ben-Yehezkel T., Nagar S., Mackranrs D., Marx Z., Linshiz G., Shabi U. & Shapiro E. (2011) BioTechniques. 50, 2, p. 124-127 Abstract
Bacterial cloning was first introduced over a century ago and has since become one of the most useful procedures in biological research, perhaps paralleled in its ubiquity only by PCR and DNA sequencing. However, unlike PCR and sequencing, cloning has generally remained a manual, labor-intensive, low-throughput procedure. Here we address this issue by developing an automated, computer-aided bacterial cloning method using liquid medium that is based on the principles of (i) limiting dilution of bacteria, (ii) inference of colony forming units (CFUs) based on optical density (OD) readings, and (iii) verification of monoclonality using a mixture of differently colored fluorescently labeled bacteria for transformation. We demonstrate the high-throughput utility of this method by employing it as a cloning platform for a DNA synthesis process.
Recursive Construction and Error Correction of DNA Molecules and Libraries from Synthetic and Natural DNA
Ben-Yehezkel T., Linshiz G., Kaplan S., Gronau I., Ravid S., Adar R. & Shapiro E. (2011) Methods in Enzymology. p. 207-245 Abstract
Making error-free, custom DNA assemblies from potentially faulty building blocks is a fundamental challenge in synthetic biology. Here, we show how recursion can be used to address this challenge using a recursive procedure that constructs error-free DNA molecules and their libraries from error-prone synthetic oligonucleotides and naturally existing DNA. Specifically, we describe how divide and conquer (D&C), the quintessential recursive problem-solving technique, is applied in silico to divide target DNA sequences into overlapping, albeit error prone, oligonucleotides, and how recursive construction is applied in vitro to combine them to form error-prone DNA molecules. To correct DNA sequence errors, error-free fragments of these molecules are then identified, extracted, and used as new, typically longer and more accurate, inputs to another iteration of the recursive construction procedure; the entire process repeats until an error-free target molecule is formed. The method allows combining synthetic and natural DNA fragments into error-free designer DNA libraries, thus providing a foundation for the design and construction of complex synthetic DNA assemblies.

2010

Processing DNA molecules as text
Shabi U., Kaplan S., Linshiz G., BenYehezkel T., Buaron H., Mazor Y. & Shapiro E. (2010) Systems and Synthetic Biology. 4, 3, p. 227-236 Abstract
Polymerase Chain Reaction (PCR) is the DNA-equivalent of Gutenberg's movable type printing, both allowing large-scale replication of a piece of text. De novo DNA synthesis is the DNA-equivalent of mechanical typesetting, both ease the setting of text for replication. What is the DNA-equivalent of the word processor? Biology labs engage daily in DNA processing-the creation of variations and combinations of existing DNA-using a plethora of manual labor-intensive methods such as site-directed mutagenesis, error-prone PCR, assembly PCR, overlap extension PCR, cleavage and ligation, homologous recombination, and others. So far no universal method for DNA processing has been proposed and, consequently, no engineering discipline that could eliminate this manual labor has emerged. Here we present a novel operation on DNA molecules, called Y, which joins two DNA fragments into one, and show that it provides a foundation for DNA processing as it can implement all basic text processing operations on DNA molecules including insert, delete, replace, cut and paste and copy and paste. In addition, complicated DNA processing tasks such as the creation of libraries of DNA variants, chimeras and extensions can be accomplished with DNA processing plans consisting of multiple Y operations, which can be executed automatically under computer control. The resulting DNA processing system, which incorporates our earlier work on recursive DNA composition and error correction, is the first demonstration of a unified approach to DNA synthesis, editing, and library construction.

2009

Molecular implementation of simple logic programs
Ran T., Kaplan S. & Shapiro E. (2009) Nature Nanotechnology. 4, p. 642-648 Abstract
Autonomous programmable computing devices made of biomolecules could interact with a biological environment and be used in future biological and medical applications. Biomolecular implementations of finite automata and logic gates have already been developed. Here, we report an autonomous programmable molecular system based on the manipulation of DNA strands that is capable of performing simple logical deductions. Using molecular representations of facts such as Man(Socrates) and rules such as Mortal(X)Man(X) (Every Man is Mortal), the system can answer molecular queries such as Mortal(Socrates)? (Is Socrates Mortal?) and Mortal(X)? (Who is Mortal?). This biomolecular computing system compares favourably with previous approaches in terms of expressive power, performance and precision. A compiler translates facts, rules and queries into their molecular representations and subsequently operates a robotic system that assembles the logical deductions and delivers the result. This prototype is the first simple programming language with a molecular-scale implementation.

2008

De novo DNA synthesis using single molecule PCR.
Ben-Yehezkel T., Linshiz G., Buaron H., Kaplan S., Shabi U. & Shapiro E. (2008) Nucleic Acids Research. 36, 17, p. e107 Abstract
The throughput of DNA reading (sequencing) has dramatically increased recently due to the incorporation of in vitro clonal amplification. The throughput of DNA writing (synthesis) is trailing behind, with cloning and sequencing constituting the main bottleneck. To overcome this bottleneck, an in vitro alternative for in vivo DNA cloning must be integrated into DNA synthesis methods. Here we show how a new single molecule PCR (smPCR)-based procedure can be employed as a general substitute to in vivo cloning thereby allowing for the first time in vitro DNA synthesis. We integrated this rapid and high fidelity in vitro procedure into our earlier recursive DNA synthesis and error correction procedure and used it to efficiently construct and error-correct a 1.8-kb DNA molecule from synthetic unpurified oligos completely in vitro. Although we demonstrate incorporating smPCR in a particular method, the approach is general and can be used in principle in conjunction with other DNA synthesis methods as well.
Cell biology: RNA computing in a living cell
Shapiro E. & Gil B. (2008) Science. 322, 5900, p. 387-388 Abstract
A system based on RNA can perform simple logical computations within a living cell, marking a step toward programming cell behavior.
Cell lineage analysis of a mouse tumor
Frumkin D., Wasserstrom A., Itzkovitz S., Stern T., Harmelin A., Eilam R., Rechavi G. & Shapiro E. (2008) Cancer Research. 68, 14, p. 5924-5931 Abstract
Revealing the lineage relations among cancer cells can shed light on tumor growth patterns and metastasis formation, yet cell lineages have been difficult to come by in the absence of a suitable method. We previously developed a method for reconstructing cell lineage trees from genomic variability caused by somatic mutations. Here, we apply the method to cancer and reconstruct, for the first time, a lineage tree of neoplastic and adjacent normal cells obtained by laser microdissection from tissue sections of a mouse lymphoma. Analysis of the reconstructed tree reveals that the tumor initiated from a single founder cell, similar to 5 months before diagnosis, that the tumor grew in a physically coherent manner, and that the average number of cell divisions accumulated in cancerous cells was almost twice than in adjacent normal lung epithelial cells but slightly less than the expected figure for normal B lymphocytes. The cells were also genotyped at the TP53 locus, and neoplastic cells were found to share a common mutation, which was most likely present in a heterozygous state. Our work shows that the ability to obtain data regarding the physical appearance, precise anatomic position, genotypic profile, and lineage position of single cells may be useful for investigating cancer development, progression, and interaction with the microenvironment.
Using expression profiles of Caenorhabditis elegans neurons to identify genes that mediate synaptic connectivity
Baruch L., Itzkovitz S., Golan Mashiach M. M., Shapiro E. & Segal E. (2008) PLoS Computational Biology. 4, 7, e1000120. Abstract
Synaptic wiring of neurons in Caenorhabditis elegans is largely invariable between animals. It has been suggested that this feature stems from genetically encoded molecular markers that guide the neurons in the final stage of synaptic formation. Identifying these markers and unraveling the logic by which they direct synapse formation is a key challenge. Here, we address this task by constructing a probabilistic model that attempts to explain the neuronal connectivity diagram of C. elegans as a function of the expression patterns of its neurons. By only considering neuron pairs that are known to be connected by chemical or electrical synapses, we focus on the final stage of synapse formation, in which neurons identify their designated partners. Our results show that for many neurons the neuronal expression map of C. elegans can be used to accurately predict the subset of adjacent neurons that will be chosen as its postsynaptic partners. Notably, these predictions can be achieved using the expression patterns of only a small number of specific genes that interact in a combinatorial fashion.
Geometric constraints on neuronal connectivity facilitate a concise synaptic adhesive code
Itzkovitz S., Baruch L., Shapiro E. & Segal E. (2008) Proceedings of the National Academy of Sciences of the United States of America. 105, 27, p. 9278-9283 Abstract
The nervous system contains trillions of neurons, each forming thousands of synaptic connections. It has been suggested that this complex connectivity is determined by a synaptic "adhesive code," where connections are dictated by a variable set of cell surface proteins, combinations of which form neuronal addresses. The estimated number of neuronal addresses is orders of magnitude smaller than the number of neurons. Here, we show that the limited number of addresses dictates constraints on the possible neuronal network topologies. We show that to encode arbitrary networks, in which each neuron can potentially connect to any other neuron, the number of neuronal addresses needed scales linearly with network size. In contrast, the number of addresses needed to encode the wiring of geometric networks grows only as the square root of network size. The more efficient encoding in geometric networks is achieved through the reutilization of the same addresses in physically independent portions of the network. We also find that ordered geometric networks, in which the same connectivity patterns are iterated throughout the network, further reduce the required number of addresses. We demonstrate our findings using simulated networks and the C. elegans neuronal network. Geometric neuronal connectivity with recurring connectivity patterns have been suggested to confer an evolutionary advantage by saving biochemical resources on the one hand and reutilizing functionally efficient neuronal circuits. Our study suggests an additional advantage of these prominent topological features - the facilitation of the ability to genetically encode neuronal networks given constraints on the number of addresses.
Towards molecular computers that operate in a biological environment
Kahan M., Gil B., Adar R. & Shapiro E. (2008) Physica D-Nonlinear Phenomena. 237, 9, p. 1165-1172 Abstract
Even though electronic computers are the only computer species we are accustomed to, the mathematical notion of a programmable computer has nothing to do with electronics. In fact, Alan Turing's notional computer [L.M. Turing, On computable numbers, with an application to the entcheidungsproblem, Proc. Lond. Math. Soc. 42 (1936) 230-265], which marked in 1936 the birth of modem computer science and still stands at its heart, has greater similarity to natural biomolecular machines such as the ribosome and polymerases than to electronic computers. This similarity led to the investigation of DNA-based computers [C.H. Bennett, The thermodynamics of computation - Review, Int. J. Theoret. Phys. 21 (1982) 905-940; A.M. Adleman, Molecular computation of solutions to combinatorial problems, Science 266 (1994) 1021-1024]. Although parallelism, sequence specific hybridization and storage capacity, inherent to DNA and RNA molecules, can be exploited in molecular computers to solve complex mathematical problems [Q. Ouyang, et a]., DNA solution of the maximal clique problem, Science 278 (1997) 446-449; R.J. Lipton, DNA solution of hard computational problems, Science 268 (1995) 542-545; R.S. Braich, et al., Solution of a 20-variable 3-SAT problem on a DNA computer, Science 296 (2002) 499-502; Liu Q., et al., DNA computing on surfaces, Nature 403 (2000) 175-179; D. Faulhammer, et al., Molecular computation: RNA solutions to chess problems, Proc. Natl. Acad. Sci. USA 97 (2000) 1385-1389; C. Mao, et al., Logical computation using algorithmic self-assembly of DNA triple-crossover molecules, Nature 407 (2000) 493-496; A.J. Ruben, et al., The past, present and future of molecular computing, Nat. Rev. Mol. Cell. Biol. 1 (2000) 69-72], we believe that the more significant potential of molecular computers lies in their ability to interact directly with a biochemical environment such as the bloodstream and living cells. From this perspective, even simple molecular computations may have important consequences when performed in a proper context. We envision that molecular computers that operate in a biological environment can be the basis of \u201csmart drugs\u201d, which are potent drugs that activate only if certain environmental conditions hold. These conditions could include abnormalities in the molecular composition of the biological environment that are indicative of a particular disease. Here we review the research direction that set this vision and attempts to realize it.
Recursive construction of perfect DNA molecules from imperfect oligonucleotides
Linshiz G., Ben-Yehezkel T., Kaplan S., Gronau I., Ravid S., Adar R. & Shapiro E. (2008) Molecular Systems Biology. 4, 191. Abstract
Making faultless complex objects from potentially faulty building blocks is a fundamental challenge in computer engineering, nanotechnology and synthetic biology. Here, we show for the first time how recursion can be used to address this challenge and demonstrate a recursive procedure that constructs error-free DNA molecules and their libraries from error-prone oligonucleotides. Divide and Conquer (D&C), the quintessential recursive problem-solving technique, is applied in silico to divide the target DNA sequence into overlapping oligonucleotides short enough to be synthesized directly, albeit with errors; error-prone oligonucleotides are recursively combined in vitro, forming error-prone DNA molecules; error-free fragments of these molecules are then identified, extracted and used as new, typically longer and more accurate, inputs to another iteration of the recursive construction procedure; the entire process repeats until an error-free target molecule is formed. Our recursive construction procedure surpasses existing methods for de novo DNA synthesis in speed, precision, amenability to automation, ease of combining synthetic and natural DNA fragments, and ability to construct designer DNA libraries. It thus provides a novel and robust foundation for the design and construction of synthetic biological molecules and organisms.
Estimating cell depth from somatic mutations
Wasserstrom A., Frumkin D., Adar R. et_al. (2008) PLoS Computational Biology. 4, 5, 1000058. Abstract
The depth of a cell of a multicellular organism is the number of cell divisions it underwent since the zygote, and knowing this basic cell property would help address fundamental problems in several areas of biology. At present, the depths of the vast majority of human and mouse cell types are unknown. Here, we show a method for estimating the depth of a cell by analyzing somatic mutations in its microsatellites, and provide to our knowledge for the first time reliable depth estimates for several cells types in mice. According to our estimates, the average depth of oocytes is 29, consistent with previous estimates. The average depth of B cells ranges from 34 to 79, linearly related to the mouse age, suggesting a rate of one cell division per day. In contrast, various types of adult stem cells underwent on average fewer cell divisions, supporting the notion that adult stem cells are relatively quiescent. Our method for depth estimation opens a window for revealing tissue turnover rates in animals, including humans, which has important implications for our knowledge of the body under physiological and pathological conditions.
Reconstruction of Cell Lineage Trees in Mice
Wasserstrom A., Adar R., Shefer G. et_al. (2008) PLoS ONE. 3, 4, 1939. Abstract
The cell lineage tree of a multicellular organism represents its history of cell divisions from the very first cell, the zygote. A new method for high-resolution reconstruction of parts of such cell lineage trees was recently developed based on phylogenetic analysis of somatic mutations accumulated during normal development of an organism. In this study we apply this method in mice to reconstruct the lineage trees of distinct cell types. We address for the first time basic questions in developmental biology of higher organisms, namely what is the correlation between the lineage relation among cells and their (1) function, (2) physical proximity and (3) anatomical proximity. We analyzed B-cells, kidney-, mesenchymal- and hematopoietic-stem cells, as well as satellite cells, which are adult skeletal muscle stem cells isolated from their niche on the muscle fibers (myofibers) from various skeletal muscles. Our results demonstrate that all analyzed cell types are intermingled in the lineage tree, indicating that none of these cell types are single exclusive clones. We also show a significant correlation between the physical proximity of satellite cells within muscles and their lineage. Furthermore, we show that satellite cells obtained from a single myofiber are significantly clustered in the lineage tree, reflecting their common developmental origin. Lineage analysis based on somatic mutations enables performing high resolution reconstruction of lineage trees in mice and humans, which can provide fundamental insights to many aspects of their development and tissue maintenance.
Amplification of multiple genomic loci from single cells isolated by laser micro-dissection of tissues
Frumkin D., Wasserstrom A., Itzkovitz S., Harmelin A., Rechavi G. & Shapiro E. (2008) BMC Biotechnology. 8, 17. Abstract
Background: Whole genome amplification (WGA) and laser assisted micro-dissection represent two recently developed technologies that can greatly advance biological and medical research. WGA allows the analysis of multiple genomic loci from a single genome and has been performed on single cells from cell suspensions and from enzymatically-digested tissues. Laser micro-dissection makes it possible to isolate specific single cells from heterogeneous tissues. Results: Here we applied for the first time WGA on laser micro-dissected single cells from stained tissue sections, and developed a protocol for sequentially performing the two procedures. The combined procedure allows correlating the cell's genome with its natural morphology and precise anatomical position. From each cell we amplified 122 genomic and mitochondrial loci. In cells obtained from fresh tissue sections, 64.5% of alleles successfully amplified to ∼700000 copies each, and mitochondrial DNA was amplified successfully in all cells. Multiplex PCR amplification and analysis of cells from pre-stored sections yielded significantly poorer results. Sequencing and capillary electrophoresis of WGA products allowed detection of slippage mutations in microsatellites (MS), and point mutations in P53. Conclusion: Comprehensive genomic analysis of single cells from stained tissue sections opens new research opportunities for cell lineage and depth analyses, genome-wide mutation surveys, and other single cell assays.

2007

A universal mechanism ties genotype to phenotype in trinucleotide diseases
Kaplan S., Itzkovitz S. & Shapiro E. (2007) PLoS Computational Biology. 3, 11, p. 2291-2298 Abstract
Trinucleotide hereditary diseases such as Huntington disease and Friedreich ataxia are cureless diseases associated with inheriting an abnormally large number of DNA trinucleotide repeats in a gene. The genes associated with different diseases are unrelated and harbor a trinucleotide repeat in different functional regions; therefore, it is striking that many of these diseases have similar correlations between their genotype, namely the number of inherited repeats and age of onset and progression phenotype. These correlations remain unexplained despite more than a decade of research. Although mechanisms have been proposed for several trinucleotide diseases, none of the proposals, being disease-specific, can account for the commonalities among these diseases. Here, we propose a universal mechanism in which length-dependent somatic repeat expansion occurs during the patient's lifetime toward a pathological threshold. Our mechanism uniformly explains for the first time to our knowledge the genotype-phenotype correlations common to trinucleotide disease and is well-supported by both experimental and clinical data. In addition, mathematical analysis of the mechanism provides simple explanations to a wide range of phenomena such as the exponential decrease of the age-of-onset curve, similar onset but faster progression in patients with Huntington disease with homozygous versus heterozygous mutation, and correlation of age of onset with length of the short allele but not with the long allele in Friedreich ataxia. If our proposed universal mechanism proves to be the core component of the actual mechanisms of specific trinucleotide diseases, it would open the search for a uniform treatment for all these diseases, possibly by delaying the somatic expansion process.
Biotechnology: Logic goes in vitro
Shapiro E. & Gil B. (2007) Nature Nanotechnology. 2, p. 84-85 Abstract

2006

Bringing DNA computers to life
Shapiro E. & Benenson Y. (2006) Scientific American. 17, 3s, p. 45-51 Abstract
Mathematician Alan Turing envisioned the properties of a mechanical computer in 1936, long before molecule-scale machines within cells could be seen and studied. Tapping the computing power of biological molecules gives rise to tiny machines that can speak directly to living cells. This new computer species will prove itself valuable for a wide range of applications, such as disease detection(.)
Tapping the computing power of biological molecules gives rise to tiny machines that can speak directly to living cells.
Shapiro E. & Benenson Y. (2006) Scientific American. 294, 5, p. 44-51 Abstract
Molekularny komputer lekarzem komórek
Shapiro E. & Benenson Y. (2006) Świat Nauki. 6, p. 50-57 Abstract
techniki informatyczne; bioinzynieria; komputery molekularne; mechanizm dzialania; nowosci techniczne
Computadores de ADN
Shapiro E. & Benenson Y. (2006) Investigación y ciencia. 358, p. 14-21 Abstract
Laboratory for Biomolecular Computers
Shapiro E., Adar R., Ben-Yehezkel T. et_al. (2006) Life Science Open Day. Abstract

2005

Genomic variability within an organism exposes its cell lineage tree
Frumkin D., Wasserstrom A., Kaplan S., Feige U. & Shapiro E. (2005) PLoS Computational Biology. 1, 5, p. 382-394 Abstract
What is the lineage relation among the cells of an organism? The answer is sought by developmental biology, immunology, stem cell research, brain research, and cancer research, yet complete cell lineage trees have been reconstructed only for simple organisms such as Caenorhabditis elegans. We discovered that somatic mutations accumulated during normal development of a higher organism implicitly encode its entire cell lineage tree with very high precision. Our mathematical analysis of known mutation rates in microsatellites (MSs) shows that the entire cell lineage tree of a human embryo, or a mouse, in which no cell is a descendent of more than 40 divisions, can be reconstructed from information on somatic MS mutations alone with no errors, with probability greater than 99.95%. Analyzing all ;1.5 million MSs of each cell of an organism may not be practical at present, but we also show that in a genetically unstable organism, analyzing only a few hundred MSs may suffice to reconstruct portions of its cell lineage tree. We demonstrate the utility of the approach by reconstructing cell lineage trees from DNA samples of a human cell line displaying MS instability. Our discovery and its associated procedure, which we have automated, may point the way to a future ''Human Cell Lineage Project'' that would aim to resolve fundamental open questions in biology and medicine by reconstructing ever larger portions of the human cell lineage tree. Copyright:

2004

BioAmbients: An abstraction for biological compartments
Regev A., Panina E., Silverman W., Cardelli L. & Shapiro E. (2004) Theoretical Computer Science. 325, 1, p. 141-167 Abstract
Biomolecular systems, composed of networks of proteins, underlie the major functions of living cells. Compartments are key to the organization of such systems. We have previously developed an abstraction for biomolecular systems using the pi-calculus process algebra, which successfully handled their molecular and biochemical aspects, but provided only a limited solution for representing compartments. In this work, we extend this abstraction to handle compartments. We are motivated by the ambient calculus, a process algebra for the specification of process location and movement through computational domains. We present the BioAmbients calculus, which is suitable for representing various aspects of molecular localization and compartmentalization, including the movement of molecules between compartments, the dynamic rearrangement of cellular compartments, and the interaction between molecules in a compartmentalized setting. Guided by the calculus, we adapt the BioSpi simulation system, to provide an extended modular framework for molecular and cellular compartmentalization, and we use it to model and study a complex multi-cellular system.
Stochastic computing with biomolecular automata
Adar R., Benenson Y., Linshiz G., Rosner A., Tishby N. & Shapiro E. (2004) Proceedings of the National Academy of Sciences of the United States of America. 101, 27, p. 9960-9965 Abstract
Stochastic computing has a broad range of applications, yet electronic computers realize its basic step, stochastic choice between alternative computation paths, in a cumbersome way. Biomolecular computers use a different computational paradigm and hence afford novel designs. We constructed a stochastic molecular automaton in which stochastic choice is realized by means of competition between alternative biochemical pathways, and choice probabilities are programmed by the relative molar concentrations of the software molecules coding for the alternatives. Programmable and autonomous stochastic molecular automata have been shown to perform direct analysis of disease-related molecular indicators in vitro and may have the potential to provide in situ medical diagnosis and cure.
An autonomous molecular computer for logical control of gene expression
Benenson Y., Gil B. (., Ben-Dor U., Adar R. & Shapiro E. (2004) Nature. 429, 6990, p. 423-429 Abstract
Early biomolecular computer research focused on laboratory-scale, human-operated computers for complex computational problems. Recently, simple molecular-scale autonomous programmable computers were demonstrated allowing both input and output information to be in molecular form. Such computers, using biological molecules as input data and biologically active molecules as outputs, could produce a system for 'logical' control of biological processes. Here we describe an autonomous biomolecular computer that, at least in vitro, logically analyses the levels of messenger RNA species, and in response produces a molecule capable of affecting levels of gene expression. The computer operates at a concentration of close to a trillion computers per microlitre and consists of three programmable modules: a computation module, that is, a stochastic molecular automaton; an input module, by which specific mRNA levels or point mutations regulate software molecule concentrations, and hence automaton transition probabilities; and an output module, capable of controlled release of a short single-stranded DNA molecule. This approach might be applied in vivo to biochemical sensing, genetic engineering and even medical diagnosis and treatment. As a proof of principle we programmed the computer to identify and analyse mRNA of disease-related genes associated with models of small-cell lung cancer and prostate cancer, and to produce a single-stranded DNA molecule modelled after an anticancer drug.

Accepted version
Invited Talk: Injecting Life with Computers
Shapiro E. (2004) Proceedings of the 2004 ACM SIGPLAN Symposium on Partial Evaluation and Semantics-based Program Manipulation. New York, NY, USA: . p. 200-200 (truePEPM '04). Abstract
Although electronic computers are the only "computer species" we are accustomed to, the mathematical notion of a programmable computer has nothing to do with wires and logic gates. In fact, Alan Turing's notional computer, which marked in 1936 the birth of modern computer science and still stands at its heart, has greater similarity to natural biomolecular machines such as the ribosome and polymerases than to electronic computers. Recently, a new "computer species" made of biological molecules has emerged. These simple molecular computers inspired by the Turing machine, of which a trillion can fit into a microliter, do not compete with electronic computers in solving complex computational problems; their potential lies elsewhere. Their molecular scale and their ability to interact directly with the biochemical environment in which they operate suggest that in the future they may be the basis of a new kind of "smart drugs": molecular devices equipped with the medical knowledge to perform disease diagnosis and therapy inside the living body. They would detect and diagnose molecular disease symptoms and, when necessary, administer the requisite drug molecules to the cell, tissue or organ in which they operate. In the talk we review this new research direction and report on preliminary steps carried out in our lab towards realizing its vision.
Algorithmic Program Debugging
Shapiro E. (2004) Abstract
Productively combines elements of programming languages, environments, logic, and inductive inference to produce effective debugging aids. Its use of the PROLOG language provides an efficient implementation of the debugging algorithms.
Molecular Computing Machines
Benenson Y. & Shapiro E. (2004) Dekker Encyclopedia of Nanoscience and Nanotechnology. p. 2043-2055 Abstract
Biopolymers such as nucleic acids and proteins encode biological data and may be viewed as strings of chemical letters. While electronic computers manipulate strings of 0s and 1s encoded in electric signals, biologically encoded data might, in principle, be manipulated by biochemical means. During the last decade, several approaches to compute with biomolecules were developed, and the field has become known as biomolecular or DNA computing. The approaches varied widely with respect to the model of computation they employed, the problems they attempted to solve, and the degree of human intervention. One approach focused on the application of the Turing machine model and, more generally, string-processing automata to biomolecular information processing. Its goal is to construct computers made of biomolecules that are capable of autonomous conversion of an input dataencoding molecule to an output molecule according to a set of rules defined by a molecular program. Here we survey the field of biomolecular computing machines and discuss possible future directions.
The pi-calculus as an abstraction for biomolecular systems
Regev A. & Shapiro E. (2004) Modelling in Molecular Biology. Ciobanu G. & Rozenberg G.(eds.). p. 219-266 (trueNATURAL COMPUTING SERIES). Abstract
Biochemical processes, carried out by networks of proteins, underlies the major functions of living cells ([8, 60]). Although such systems are the focus of intensive experimental research, the mountains of knowledge about the function, activity, and interaction of molecular systems in cells remain fragmented. While computational methods are key to addressing this challenge ([8, 60]), they require the adoption of a meaningful mathematical abstraction [50s]. The research of biomolecular systems has yet to identify and adopt such a unifying abstraction.

2003

DNA molecule provides a computing machine with both data and fuel
Benenson Y., Adar R., Paz-Elizur T., Livneh Z. & Shapiro E. (2003) Proceedings of the National Academy of Sciences of the United States of America. 100, 5, p. 2191-2196 Abstract
The unique properties of DNA make it a fundamental building block in the fields of supramolecular chemistry, nanotechnology, nano-circuits, molecular switches, molecular devices, and molecular computing. In our recently introduced autonomous molecular automaton, DNA molecules serve as input, output, and software, and the hardware consists of DNA restriction and ligation enzymes using ATP as fuel. In addition to information, DNA stores energy, available on hybridization of complementary strands or hydrolysis of its phosphodiester backbone, Here we show that a single DNA molecule can provide both the input data and all of the necessary fuel for a molecular automaton. Each computational step of the automaton consists of a reversible software molecule/input molecule hybridization followed by an irreversible software-directed cleavage of the input molecule, which drives the computation forward by increasing entropy and releasing heat. The cleavage uses a hitherto unknown capability of the restriction enzyme Fokl, which serves as the hardware, to operate on a noncovalent software/input hybrid. In the previous automaton, software/input ligation consumed one software molecule and two ATP molecules per step. As ligation is not performed in this automaton, a fixed amount of software and hardware molecules can, in principle, process any input molecule of any length without external energy supply. Our experiments demonstrate 3 × 10¹² automata per μl performing 6.6 × 10¹⁰ transitions per second per μl with transition fidelity of 99.9%, dissipating about 5 × 10^-9 W/μl as heat at ambient temperature.
Cells as computation
Regev A. & Shapiro E. (2003) Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). Priami C.(eds.). p. 1-3 Abstract
Cells as computation (Reprinted from Nature, vol 419, pg 343, 2002)
Regev A. & Shapiro E. (2003) Computational Methods In Systems Biology, Proceedings. 2602, p. 1-3 Abstract
Keywords: Computer Science, Interdisciplinary Applications; Computer Science, Theory & Methods

2002

Are black boxes transparent? - High school students' strategies of using abstract data types
Haberman B., Shapiro E. & Scherz Z. (2002) Journal of Educational Computing Research. 27, 4, p. 411-436 Abstract
For the last ten years the concept of abstract data types (ADTs) has been introduced to Israeli high school students who take the courses "Logic Programming" and "Introduction to Artificial Intelligence" as part of their Computer Science (CS) curriculum. We developed an instructional model designed to introduce ADT as a formal CS concept and as a tool for developing computer programs. To implement the abstract data types, we used "black boxes." Here we describe our research whose aim was to assess the cognitive aspects of using ADTs as tools for knowledge representation and problem solving by different student populations. Our research findings indicated that many students' perceptions of the ADT concept were compatible to its formal CS definition, and they used black boxes transparently to develop Prolog programs. However, there were also students who adopted their own strategies for using ADTs: a) using ADTs only as a graphical description to organize data and knowledge; and b) using "ADT black boxes" in alternative ways, while violating the principle of information hiding - duplication, rewriting, and externalization. The findings also indicated that "ADT black boxes" were beneficial in helping beginners to develop correctly written Prolog programs.
Cells as computation
Regev A. & Shapiro E. (2002) Nature. 419, 6905, p. 343 Abstract

Accepted version

2001

Programmable and autonomous computing machine made of biomolecules
Benenson Y., Paz-Elizur T., Adar R., Keinan E., Livneh Z. & Shapiro E. (2001) Nature. 414, 6862, p. 430-434 Abstract
Devices that convert information from one form into another according to a definite procedure are known as automata. One such hypothetical device is the universal Turing machine¹, which stimulated work leading to the development of modern computers. The Turing machine and its special cases², including finite automata³, operate by scanning a data tape, whose striking analogy to information-encoding biopolymers inspired several designs for molecular DNA computers^4-8. Laboratory-scale computing using DNA and human-assisted protocols has been demonstrated^9-15, but the realization of computing devices operating autonomously on the molecular scale remains rare^16-20. Here we describe a programmable finite automaton comprising DNA and DNA-manipulating enzymes that solves computational problems autonomously. The automaton's hardware consists of a restriction nuclease and ligase, the software and input are encoded by double-stranded DNA, and programming amounts to choosing appropriate software molecules. Upon mixing solutions containing these components, the automaton processes the input molecule via a cascade of restriction, hybridization and ligation cycles, producing a detectable output molecule that encodes the automaton's final state, and thus the computational result. In our implementation 10¹² automata sharing the same software run independently and in parallel on inputs (which could, in principle, be distinct) in 120 μl solution at room temperature at a combined rate of 10⁹ transitions per second with a transition fidelity greater than 99.8%, consuming less than 10^-10 W.

Accepted version
Application of a stochastic name-passing calculus to representation and simulation of molecular processes
Priami C., Regev A., Shapiro E. & Silverman W. (2001) Information Processing Letters. 80, 1, p. 25-31 Abstract
We describe a novel application of a stochastic name-passing calculus for the study of biomolecular systems. We specify the structure and dynamics of biochemical networks in a variant. of the stochastic pi -calculus, yielding a model which is mathematically well-defined and biologically faithful. We adapt the operational semantics of the calculus to account for both the time and probability of biochemical reactions, and present a computer implementation of the calculus for biochemical simulations.

Submitted version
Representation and simulation of biochemical processes using the pi-calculus process algebra.
Regev A., Silverman W. & Shapiro E. (2001) Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing. p. 459-470 Abstract
Despite the rapidly accumulating body of knowledge about protein networks, there is currently no convenient way of sharing and manipulation of such information. We suggest that a formal computer language for describing the biomolecular processes underlying protein networks is essential for rapid advancement in this field. We propose to model biomolecular processes by using the pi-Calculus, a process algebra, originally developed for describing computer processes. Our model for biochemical processes is mathematically well-defined, while remaining biologically faithful and transparent. It is amenable to computer simulation, analysis and formal verification. We have developed a computer simulation system, the PiFCP, for execution and analysis of pi-calculus programs. The system allows us to trace, debug and monitor the behavior of biochemical networks under various manipulations. We present a pi-calculus model for the RTK-MAPK signal transduction pathway, formally represent detailed molecular and biochemical information, and study it by various PiFCP simulations.

1998

Abstract data types as a project development organiser
Haberman B., Scherz Z. & Shapiro E. (1998) SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education). 30, 3, p. 102-104 Abstract
Abstract Data Types (ADTs) has been introduced recently to Israeli high-school students as tools for problem solving and knowledge representation in Prolog environment. This paper reviews a research that was conducted in order to assess the influence of students' use of ADTs on the development process of knowledge-based Prolog programs.
Abstract data types as a project development organiser
Haberman B., Scherz Z. & Shapiro E. (1998) SIGCSE Bulletin (Association for Computing Machinery, Special Interest Group on Computer Science Education). 30, 3, p. 102-104 Abstract
Abstract Data Types (ADTs) has been introduced recently to Israeli high-school students as tools for problem solving and knowledge representation in Prolog environment. This paper reviews a research that was conducted in order to assess the influence of students' use of ADTs on the development process of knowledge-based Prolog programs.
Development, implementation and evaluation of a course in expert systems for high-school students (poster)
Ragonis N., Shapiro E., Ben-Ari M. & Scherz Z. (1998) SIGMOD Record. 30, 3, p. 300 Abstract

1995

On the structural simplicity of machines and languages
Moscowitz Y. & Shapiro E. (1995) Annals of Mathematics and Artificial Intelligence. 15, 3-4, p. 379-405 Abstract
We employ an algebraic method for comparing the structural simplicity of families of abstract machines. We associate with each family of machines a first-order structure that includes machines as objects, composition operations, which construct larger machines from smaller ones, as functions, and a set of semantic relations. We then compare families of machines by studying the existence of homomorphisms between the associated structures. Given families of machines L, L with associated structures S, S, we say that L is simpler than L if there is a homomorphism of S into S, but not vice versa. We show that across several abstract machine models - finite automata, Turing machines, and logic programs - deterministic machines are simpler than nondeterministic machines and nondeterministic machines are simpler than alternating machines. Our results cross computational complexity boundaries. We show that for Turing machines, finite automata and logic programs every non-deterministic variant is not simpler than any deterministic variant, and that every alternating variant is not simpler than any nondeterministic (non-alternating) variant. In addition, we show that nondeterministic and alternating Turing machines are equivalent in their structural simplicity to iterative and ordinary logic programs, respectively, and that deterministic Turing machines are as simple as deterministic logic programs. We thus establish, in a sense somewhat independent of computational complexity classes and machine models, that determinism is simpler than nondeterminism and that nondeterminism is simpler than alternation. The method of comparison we employ is quite general. It has been applied successfully to the comparison of concurrent programming languages and we expect it to be applicable to other languages and machine models as well.

1994

The Art of Prolog: Advanced Programming Techniques
Sterling L. & Shapiro E. (1994) (trueLogic Programming). Abstract
This new edition of The Art of Prolog contains a number of important changes. Most background sections at the end of each chapter have been updated to take account of important recent research results, the references have been greatly expanded, and more advanced exercises have been added which have been used successfully in teaching the course.Part II, The Prolog Language, has been modified to be compatible with the new Prolog standard, and the chapter on program development has been significantly altered: the predicates defined have been moved to more appropriate chapters, the section on efficiency has been moved to the considerably expanded chapter on cuts and negation, and a new section has been added on stepwise enhancementa systematic way of constructing Prolog programs developed by Leon Sterling. All but one of the chapters in Part III, Advanced Prolog Programming Techniques, have been substantially changed, with some major rearrangements. A new chapter on interpreters describes a rule language and interpreter for expert systems, which better illustrates how Prolog should be used to construct expert systems. The chapter on program transformation is completely new and the chapter on logic grammars adds new material for recognizing simple languages, showing how grammars apply to more computer science examples.

1993

Concurrent Logic Languages are Truly Object-Oriented
Shapiro E. (1993) IEEE Software. 10, 2, p. 11-11 Abstract
The 5th Generation Project - Epilogue
Shapiro E. & Warren D. H. (1993) Communications of the ACM. 36, 3, p. 100-101 Abstract
The 5th Generation Project - Introducation
Shapiro E. & Warren D. H. D. (1993) Communications of the ACM. 36, 3, p. 46-49 Abstract
Structured Messages, Agents, and Workllows
Goldberg Y., Safran M., Silverman W. & Shapiro E. (1993) Reasings in groupware and computer-supported cooperative work. Beacker R. M.(eds.). p. 457-518 Abstract
One of the key problems when any group of people cooperates to solve problems or make decisions is how to share information. Thus one of the central goals of designing good" organizational interfaces"(Malone, 1985) should be to help people share information in groups and organizations. In this chapter, we describe aprototype system, called the Information Lens, that focuses on one aspect of this problem: how to help people share the many diverse kinds of qualitative information that are communicated via electronic messaging systems. We also show how the same general capabilities that help with information sharing can be used to support a variety of more specific coordination processes such as task tracking and meeting scheduling.

1992

Active mail - a framework for implementing groupware
Goldberg Y., Safran M. & Shapiro E. (1992) Proceedings of the Conference on Computer-Supported Cooperative Work. p. 75-83 (trueProceedings of the Conference on Computer-Supported Cooperative Work). Abstract
Most existing groupware products are either too passive or very intrusive. They either passively wait for user action or actively interfere with normal workstation activity by intruding on the user's screen; they are one-sided push or pull mechanisms. A system for computer-mediated interaction, Active Mail obviates the dilemma with a protocol which enables a groupware application to involve a new user in a way that is non-intrusive, tolerates delayed response, and requires little effort on the user's part. Active Mail piggybacks on ordinary electronic mail, retaining all the features that have made it so successful. Active Mail messages are used to establish persistent interactive connections among a group of users. Receivers of Active Mail messages can interact with the sender, with future recipients, and with remote, distributed multi-user applications. Groupware applications realized within the Active Mail framework include a text conversation tool, a collaborative writing facility with a floor passing protocol and revision control management, an interactive meeting scheduler, and some distributed multi-user interactive games. In this paper we describe the architecture of Active Mail, present some of its applications, and discuss our preliminary experience with it.
Spatial machines: a more realistic approach to parallel computation
Feldman Y. (. & Shapiro E. (1992) Communications of the ACM. 35, 10, p. 60-73 Abstract
Embeddings among concurrent programming - languages\u200f (Preliminary Version)
Shapiro E. (1992) CONCUR '92. Cleaveland W. R.(eds.). p. 486-503 Abstract
We relate several well-known concurrent programming languages by demonstrating mappings among them that are homomorphic with respect to parallel composition and preserve fully-abstract semantic distinctions. The results are presented within a general framework for language comparison based on language embeddings, and complement our earlier negative results presented within this framework. Together, the positive and negative results induce a nontrivial preordering on the family of concurrent programming languages that quite often coincides with previous intuitions on the 'expressive power'' of these languages. Our results reveal interesting connections between hitherto unrelated concurrent languages and models of concurrency, and provide evidence to the viability of the theory of structural simplicity as a formal framework for the study of programming language expressiveness.
Logic programs with inheritance
GOLDBERG Y., SILVERMAN W. & SHAPIRO E. (1992) Fifth Generation Computer Systems 1992, Vols 1 and 2. p. 951-960 Abstract

1991

A type system for logic programs
Yardeni E. & Shapiro E. (1991) The Journal of Logic Programming. 10, 2, p. 125-153 Abstract
A theory for a type system for logic programs is developed which addressesthe question of well-typing, type inference, and compile-time and run-time type checking. A type is a recursively enumerable set of ground atoms, which is tuple-distributive. The association of a type to a program is intended to mean that only ground atoms that are elements of the type may be derived from the program. A declarative definition of well-typed programs is formulated, based on an intuitive approach related to the fixpoint semantics of logic programs. Whether a program is well typed is undecidable in general. We define a restricted class of types, called regular types, for which type checking is decidable. Regular unary logic programs are proposed as a specification language for regular types. An algorithm for type-checking a logic program with respect to a regular type definition is described, and its complexity is analyzed. Finally, the practicality of the type system is discussed, and some examples are shown. The type system has been implemented in FCP for FCP and is incorporated in the Logix system.
Communication with directed logic variables
Kleinman A., Moscowitz Y., Pnueli A. & Shapiro E. (1991) Conference Record of the Annual ACM Symposium on Principles of Programming Languages. p. 221-232 (trueConference Record of the Annual ACM Symposium on Principles of Programming Languages). Abstract
A directed logic variable is a two-port communication channel that can transmit at most one message, which may contain embedded ports. The ability to send both input and output ports in messages allows directed logic variables to implement other communication primitives such as streams and remote procedure calls; to realize dynamic process migration; and to support sophisticated mail and window system protocols. Hence directed logic variables can be useful as an implementation layer for concurrent languages and as an extension to sequential procedural languages. We provide an abstract, programming language independent, operational semantics for communication with directed logic variables and develop effective distributed implementations of it. Our implementation algorithms ensure that, in spite of arbitrary port migration, if a message is sent on a channel's output port and the channel's input port eventually stops migrating then the message will eventually reach it. In addition, our algorithms are efficient and robust: They dynamically shorten the logical routing paths caused by port migration, collect unused routing table entries, and are resilient to a certain class of network faults. We prove that the algorithms implement the operational semantics using the refinement mapping proof technique and temporal logic inference rules for weak and strong fairness, demonstrating the effectiveness of combining these two methods.
Replay, recovery, replication, and snapshots of nondeterministic concurrent programs
Gaifman H., Maker M. J. & Shapiro E. (1991) Proceedings of the Annual ACM Symposium on Principles of Distributed Computing. p. 241-255 (trueProceedings of the Annual ACM Symposium on Principles of Distributed Computing). Abstract
The problem of replaying computations of non-deterministic concurrent programs arises in contexts such as debugging and recovery. We investigate the problem for an abstract model of concurrency, which generalizes dataflow networks, processors with shared variables, and logic programming models of concurrency. We say that nondeterminism is visible if the state is determined, up to some (appropriately defined) notion of equivalence, by the external behavior. We show that if nondeterminism is visible then replay is achievable using a. one-step lookahead sequential simulation algorithm. If the program has an additional monotonicity property called stability then recovery is possible without simulating the original computation, by restarting the program from a. certain easily constructed state. Also, for stable programs with visible nondeterminism, a process composed of identical parallel processes has the same external behavior as each of its components. Hence high crash-failure resilience is achievable by simple process replication. For such programs there is also an easy solution to the asynchronous snapshot problem. Stability holds for certain concurrent logic/constraint programming languages. We describe an efficient method for transforming a given stable concurrent logic/constraint program to an equivalent one with visible nondeterminism. The transformation has acceptable execution overhead, thus it could be employed in a practical realization of the proposed methods.
Embeddings among concurrent programming - languages
Shapiro E. (1991) Parle 91 Parallel Architectures and Languages Europe. Aarts E. H. L., Leeuwen J. & Rem M.(eds.). Vol. 505. p. 480-480 Abstract
In a previous paper [1] we developed an algebraic framework for language comparison based on language embeddings, which are mappings that preserve certain aspects of both the syntactic and the semantic structure of the language. We provided separation results by demonstrating the nonexistence of various kinds of embeddings among languages. In this talk we survey the framework and complement the separation results with positive results, i.e., demonstrate embeddings among various well-known concurrent languages. Together the positive and negative results induce a preordering on the family of concurrent programming languages that quite often coincides with previous intuitions on the "expressive power" of these languages.
Temporal debugging and its visual animation
Feldman Y. & Shapiro E. (1991) p. 3-17 Abstract
A bug isolation technique called Temporal Debugging is suggested in an attempt to improve on Concurrent Algorithmic Debugging [6] in both query complexity and number of queries. Temporal Debugging relies on a programmer's knowledge of the correctness of partial computational histories and optimizes the search based on the programmer's ability to specify the earliest error manifestation that he or she has observed. We then present an approach of combining visualization of computation and Temporal Debugging in an effort to come up with a practical algorithmic debugger for concurrent logic programs. Abstractions of concurrent computation such as process groups, data clusters, and abstract events are proposed as a basis for a visual animation method called Interactive Communicating Thread Animation that is suggested as a visual user interface to Temporal Debugging.
Polymorphically Typed Logic Programs
YARDENI E., FRUEHWIRTH T. & SHAPIRO E. (1991) Logic Programming: Proceedings of the Eighth International Conference. FURUKAWA K.(eds.). p. 379-393 (trueLOGIC PROGRAMMING). Abstract
Logic programs as types for logic programs\u200f
FRUHWIRTH T., SHAPIRO E., VARDI M. & YARDENI E. (1991) Sixth Annual IEEE Symposium on Logic in Computer Science. p. 300-309 Abstract
Optimistic type systems for logic programs are considered. In such systems types are conservative approximations to the success set of the program predicates. The use of logic programs to describe types is proposed. It is argued that this approach unifies the denotational and operational approaches to descriptive type systems and is simpler and more natural than previous approaches. The focus is on the use of unary-predicate programs to describe the types. A proper class of unary-predicate programs is identified, and it is shown that it is expensive enough to express several notions of types. An analogy with two-way automata and a correspondence with alternating algorithms are used to obtain a complexity characterization of type inference and type checking. This characterization is facilitated by the use of logic programs to represent types.
Lexical logic programs
MOSCOWITZ Y. & SHAPIRO E. (1991) Logic programming : proceedings of the 8th international conference. FURUKAWA K.(eds.). p. 349-363 (trueLOGIC PROGRAMMING). Abstract
Separating concurrent languages with categories of language embedding
Shapiro E. (1991) Proceedings of the 23rd Annual ACM Symposium on Theory of Computing, STOC 1991. p. 198-208 (trueProceedings of the Annual ACM Symposium on Theory of Computing). Abstract
Concurrent programming enjoys a proliferation of languages but suffers from the lack of a general method of language comparison. In particular, concurrent (as well as sequential) programming languages cannot be usefully distinguished based on complexitytheoretic considerations, since most of them are Turingcomplete. Nevertheless, differences between programming languages matter, else we would not have invented so many of them. We develop a general method for comparing concurrent programming languages based on their algebraic (structural) complexity, and, using this method, achieve separation results among many well-known concurrent languages. The method is not restricted to concurrent languages. It can be used to compare the algebraic complexity of abstract machine models, other families of programming languages, logics, and, more generaly, any family of languages with some syntactic operations and a notion of semantic equivalence. The method can also be used to compare the algebraic complexity of families of operations wit hin a language or across languages. We note that using the method we were able to compare languages and computational models that do not have a common semantic basis.

1990

Guest editors' preface
Furukawa K., Amamiya M., Shapiro E. & Warren D. H. (1990) New Generation Computing. 7, p. 85-87 Abstract
The languages FCP(:) and FCP(:,?)
Yardeni E. H., KLIGER S. & Shapiro E. (1990) New Generation Computing. 7, p. 89-107 Abstract
The language FCP(:,?) is the outcome of attempts to integrate the best of several flat concurrent logic programming languages, including Flat GHC, FCP (↓, |) and Flat Concurrent Prolog, in a single consistent framework. FCP(:) is a subset of FCP(:, ?), which is a variant of FPP(↓, |) and employs concepts of the concurrent constraint framework of cc(↓, |). FCP(:, ?) is a language which is strong enough to accommodate all useful concurrent logic programming techniques, including those which rely on atomic test unification and read-only variables, yet incorporates the weaker languages mentioned as subsets. This allows the programmer to remain within a simple subset of the language such as Flat GHC when the full power of atomic unification or read-only variables is not needed.
Distributed variable server for atomic unification
KLEINMAN A., MOSES Y. & SHAPIRO E. (1990) Proceedings of the ninth annual ACM symposium on Principles of distributed computing. p. 59-74 Abstract
Processes in concurrent logic programs communicate and synchronize using shared singleassignment variables. Communication is performed via unification operations, while synchronization is performed through input matching. The more expressive concurrent logic languages employ atomic unification as the basic communication primitive. Atomic unification can be thought of as an atomic transaction that either fails with no trace or succeeds in writing on all of the necessary writable variables occurring in the unification instance. This paper presents a distributed variable server algorithm that can be used as the key component in a distributed implementation of concurrent logic languages with atomic unification. The variable server provides an abstraction in which processes can act as if all of the shared variables are local. It thus allows programs to be written for a shared memory model and executed in a system in which memory is distributed. We rigorously specify the requirements the variable server must fulfill, present an algorithm satisfying the specification, and prove its correctness with respect to the specification. The algorithm has been implemented for the language Flat Concurrent Prolog (FCP) and is incorporated in the distributed version of the Logix system.
FCP Sequential Abstract Machine Characteristics for the Systems Development Workload
ALKALAJ L., LANG T. & SHAPIRO E. (1990) LOGIC PROGRAMMING : PROCEEDINGS OF THE 1990 NORTH AMERICAN CONFERENCE. HERMENEGILDO M. & DEBRAY S.(eds.). p. 321-339 (trueLOGIC PROGRAMMING). Abstract
From decision trees to decision graphs
KLIGER S. & SHAPIRO E. (1990) LOGIC PROGRAMMING : PROCEEDINGS OF THE 1990 NORTH AMERICAN CONFERENCE. HERMENEGILDO M. & DEBRAY S.(eds.). p. 97-116 (trueLOGIC PROGRAMMING). Abstract

1989

Linda in context
Shapiro E. (1989) Communications of the ACM. 32, 10, p. 1244-1249 Abstract
A sequential abstract machine for Flat Concurrent Prolog\u200f
Houri A. & Shapiro E. (1989) The Journal of Logic Programming. 7, 2, p. 85-123 Abstract
This paper describes a uniprocessor implementation of Flat Concurrent Prolog, based on an abstract machine and a compiler for it. The machine instruction set includes the functionality necessary to implement efficiently the parallel semantics of the language. In addition, the design includes a novel approach to the integration of a module system into a language. Both the compiler and the emulator for the abstract machine have been implemented and form the basis of Logix, a practical programming environment for Flat Concurrent Prolog. Its performance suggests that a process-oriented language need not be less efficient then a procedure-oriented language, even on a uniprocessor. In particular, it shows that a process queue and process spawning can be implemented as effectively as an activation stack and procedure calling, and thus debunks the \u201cexpensive-process-spawn myth\u201d.
The family of concurrent logic programming languages\u200f
Shapiro E. (1989) Computing Surveys. 21, 3, p. 413-510 Abstract
Concurrent logic languages are high-level programming languages for parallel and distributed systems that offer a wide range of both known and novel concurrent programming techniques. Being logic programming languages, they preserve many advantages of the abstract logic programming model, including the logical reading of programs and computations, the convenience of representing data structures with logical terms and manipulating them using unification, and the amenability to metaprogramming. Operationally, their model of computation consists of a dynamic set of concurrent processes, communicating by instantiating shared logical variables, synchronizing by waiting for variables to be instantiated, and making nondeterministic choices, possibly based on the availability of values of variables. This paper surveys the family of concurrent logic programming languages within a uniform operational framework. It demonstrates the expressive power of even the simplest language in the family and investigates how varying the basic synchronization and control constructs affect the expressiveness and efficiency of the resulting languages. In addition, the paper reports on techniques for sequential and parallel implementation of languages in this family, mentions their applications to date, and relates these languages to the abstract logic programming model, to the programming language PROLOG, and to other concurrent computational models and programming languages.
A distributed locking algorithm for a transactions system in flat concurrent prolog\u200f
RECHES E., GUDES E. & Shapiro E. (1989) Foundations of Data Organization and Algorithms. Schek H. & Litwin W.(eds.). Vol. 367. p. 304-320 (trueLecture Notes in Computer Science). Abstract
This paper reports on the design and the implementation of a distributed transactions-system for a Universal File Server. The system maintains consistency in a general purpose file-system by means of concurrency control and crash recovery. Both the distributivity of a transaction and the intra-transaction concurrency, are depicted by a single model which describes the transaction as a partially ordered set of operations. The main concurrency control algorithm described in this paper is a novel distributed locking management algorithm based on the two-phase-locking (2pl) protocol. The system is implemented in Flat Concurrent Prolog (FCP), a concurrent logic programming language. FCP lends itself to the development of new distributed algorithms which utilize the fine-grained concurrency and the powerful communication and synchronization mechanisms supplied by the language. The features of concurrent logic languages, which are useful for implementing file and database systems are demonstrated in this paper.
OR-parallel PROLOG in flat concurrent PROLOG
Shapiro E. (1989) The Journal of Logic Programming. 6, 3, p. 243-267 Abstract
We describe a simple or-parallel execution algorithm for PROLOG that naturally collects all solutions to a goal. For a large class of programs the algorithm has O(log n) overhead and exhibits O(n/(log n)2) parallel speedup over the standard sequential algorithm. Its constituent parallel processes are independent, and hence the algorithm is suitable for implementation on non-shared-memory parallel computers. The algorithm can be implemented directly in Flat Concurrent PROLOG. We describe a simple interpreter-based fcp implementation of the algorithm, analyze its performance under Logix, and include initial measurements of its speedup on the parallel implementation of fcp. The implementation is easily extended. We show an extension that performs parallel demand-driven search. We define two parallel variants of cut, cut-clause and cut-goal, and describe their implementation. We discuss the execution of the algorithm on a parallel computer, and describe implementations of it that perform centralized and distributed dynamic load balancing. Since the fcp implementation of the algorithm relies on full test unification, the algorithm does not seem to have a similarly natural implementation in ghc or parlog.
FCP: A summary of performance results
Taylor S., Shapiro R. & Shapiro E. (1989) Proceedings of the 3rd Conference on Hypercube Concurrent Computers and Applications. Fox G.(eds.). p. 1364-1373 (trueProceedings of the 3rd Conference on Hypercube Concurrent Computers and Applications: Architecture, Software, Computer Systems, and General Issues, C3P 1988). Abstract
Flat Concurrent Prolog is a simple concurrent programming language which has been used for a variety of non-trivial applications. A compiler based parallel implementation has been completed which operates on an Intel Hypercube. This paper presents a brief summary of performance data from a recent study of the implementation. Three categories of program were studied: parallel applications, uniprocessor benchmarks and communication stereotypes. The latter programs are abstractions of common parallel programming techniques and serve to quantify the cost of communication in the language.
Concurrent algorithmic debugging
LICHTENSTEIN Y. & Shapiro E. (1989) Proceedings of the 1988 ACM SIGPLAN and SIGOPS workshop on Parallel and distributed debugging. 1 ed. Vol. 24. p. 248-260 (trueSigplan Notices). Abstract
Algorithmic Debugging is a theory of debugging that uses queries on the compositional semantics of a program in order to localize bugs. It uses the following principle: if a computation of a program's component gives an incorrect result, while all the subcomputations it invokes compute correct results, then the code of this component is erroneous. Algorithmic Debugging is applied, in this work, to reactive systems, in particular to programs written in Flat Concurrent Prolog (FCP). Debugging reactive systems is known to be more difficult than the debugging of functional systems. A functional system is fully described by the relation between its initial input and final output; this context-freedom is used in debugging. A reactive system continuously responds to external inputs, thus its debugging cannot make use of context-free input/output relations. Given a compositional semantic model for a concurrent programming language, we demonstrate how one can directly apply the ideas of Algorithmic Debugging to obtain a theory of program debugging for the considered language. The conflict between the context-freedom of input/ output relations and the reactive nature of concurrent systems is resolved by using semantic objects which record the reactive nature of the system's components. In functional algorithmic debugging the queries relate to input/output relations; in concurrent algorithmic debugging the queries refer to semantic objects called processes which capture the reactive nature of FCP computations. A diagnosis algorithm for incorrect FCP programs is proposed. The algorithm gets an erroneous computation and using queries isolates an erroneous clause or an incomplete procedure. An FCP implementation of the diagnosis algorithm demonstrates the usefulness as well as the difficulties of Algorithmic Debugging of FCP programs.
Using safe approximations of fixed points for analysis of logic programs
CODISH M., GALLAGHER J. & SHAPIRO E. (1989) Meta-programming in logic programming. ABRAMSON H. & ROGERS MH.(eds.). p. 233-261 (trueLOGIC PROGRAMMING). Abstract
In this paper we apply the result that when using abstract interpretation for program analysis, we do not necessarily need to compute an abstract least fixed point, or indeed any fixed point of the abstract semantic function. Any meaning which is a safe approximation of the least fixed point can give useful information. In particular, the sequence of meanings obtained by applying the semantic function iteratively to the top element of the domain yields safe approximations, since this sequence approaches the greatest fixed point from above. The framework of the technique is first defined, and its advantages and disadvantages discussed. Two applications of the technique for the parallel logic programming language Flat Concurrent Prolog (FCP) are presented. The mode analysis application is based on a collecting semantics for SLD derivations, and has been incorporated in a compiler for FCP. The suspension analysis application is the first attempt, to our knowledge, to apply abstract interpretation to analyse reactive aspects of a parallel programming language.
Proof theory and semantics of logic programs\u200f
GAIFMAN H. & SHAPIRO E. (1989) Fourth Annual Symposium on Logic in Computer Science. Anon(eds.). p. 50-62 Abstract
The authors develop a resolution logic that is based on direct proofs rather than on proofs by refutations. The deductive system studied has clauses as its formulas and resolution as the sole inference rule. They analyze this deductive system using a novel representation of resolution proofs, called resolution graphs, and obtain a general completeness theorem: a clause is a logical consequence of a set of clauses if and only if it is either tautological or subsumed by a clause derivable from that set. In a previous paper (proc. 16th ACM Symp. on Principles of Prog. Lang., pp.134-42, 1989), the authors developed a model-theoretic compositional semantics for logic programs and investigated the fully abstract equivalences induced by various notions of composition. They continue that study here using the proof theory of resolution logic. This proof theory gives rise to various semantics for logic programs that reflect more operational details than does the model-theoretic semantics.
Fully abstract compositional semantics for logic programs\u200f
GAIFMAN H. & SHAPIRO E. (1989) CONFERENCE RECORD OF THE SIXTEENTH ANNUAL ACM SYMPOSIUM ON PRINCIPLES OF PROGRAMMING LANGUAGES. p. 134-142 Abstract
We propose a framework for discussing fully abstract compositional semantics, which exposes the interrelations between the choices of observables, compositions, and meanings. Every choice of observables and compositions determines a unique fully abstract equivalence. A semantics is fully abstract if it induces this equivalence. We study the semantics of logic programs within this framework. We find the classical Herbrand-base semantics of logic programs inadequate, since it identifies programs that should be distinguished and vice versa. We therefore propose alternative semantics, and consider the cases of no compositions, composition by program union, and composition of logic modules (programs with designated exported and imported predicates). Although equivalent programs can be in different vocabularies, we prove that our fully abstract semantics are always in a subvocabulary of that of the program. This subvocabulary, called the essential vocabulary, is common to all equivalent programs. The essential vocabulary is also the smallest subvocabulary in which an equivalent program can be written.
Reactive Behavior Semantics for Concurrent Constraint Logic\u200f Programs - (Preliminary Version)
GAIFMAN H., MAHER M. & SHAPIRO E. (1989) LOGIC PROGRAMMING : PROCEEDINGS OF THE NORTH AMERICAN CONFERENCE, 1989, VOL 1-2. LUSK EL. & OVERBEEK RA.(eds.). p. 553-569 (trueLOGIC PROGRAMMING). Abstract
Control algebras for constraint logics and testing equivalence
Gerth R., Codish M. & Shapiro E. (1989) Vol. CS89-16. (trueTechnical Report). Abstract

1988

Fully abstract denotational semantics for Flat Concurrent Prolog.
Gerth R., Codish M., Lichtenstein Y. & Shapiro E. (1988) Proc Third Annu Symp on Logic in Comput Sci. p. 320-335 (trueProc Third Annu Symp on Logic in Comput Sci). Abstract
A denotational, hence, compositional semantics for a a subset of concurrent Prolog is developed and related to an operational semantics. The denotational semantics makes divergence and the resultant substitutions of finite computations together with the termination mode--success, failure, or deadlock--observable. Relative to this notion of observation it is proved that the denotational semantics is fully abstract in the sense that it records the minimal amount of extra information beyond the observables to make it compositional. Full abstraction is an important property because it quantifies the information that one needs in order to reason about individual program-parts independently. This is believed to be the first such result in the area of concurrent logic programming.
Specialisation of Prolog and FCP programs using abstract interpretation
GALLAGHER J., CODISH M. & Shapiro E. (1988) New Generation Computing. 6, p. 159-186 Abstract
This paper presents an approach to specialising logic programs which is based on abstract interpretation. Program specialisation involves two stages, the construction of an abstract computation tree and a program construction stage. For the tree construction stage, abstract OLDT resolution is defined and used to construct a complete and finite tree corresponding to a given logic program and a goal. In the program construction stage, a specialised program is extracted from this tree.We focus on two logic programming languages: sequential Prolog and Flat Concurrent Prolog. Although the procedural reading of concurrent logic programs is very different from that of sequential programs, the techniques presented provide a uniform approach to the specialisation of both languages. We present the results for Prolog rigorously, and extend them less formally to Flat Concurrent Prolog.There are two main advantages of basing program specialisation on abstract interpretation. Firstly, termination can be ensured by using abstract interpretations over finite domains, while performing a complete flow analysis of the program. Secondly, correctness of the specialised program is closely related to well-defined consistency conditions on the concrete and abstract interpretations.
Computation control and protection in the Logix system
Hirsch M., Silverman W. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 28-45 Abstract
Translation of safe GHC and safe Concurrent Prolog to FCP
Levy J. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 383-414 Abstract
Representation and enumeration of flat Concurrent Prolog computations
Lichtenstein Y., Codish M. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 197-210 Abstract
Hardware description and simulation using Concurrent Prolog
Weinbaum D. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 470-490 Abstract
The Logix system user manual version 1.21
Silverman W., Hirsch M., Houri A. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 46-77 Abstract
Embedding linda and other joys of concurrent logic programming
Shapiro E. (1988) Communications of The ACM - CACM. Abstract
Systolic programming: a paradigm of parallel processing
Shapiro E. (1988) Concurrent Prolog. Shapiro E. & Fuchi K.(eds.). Cambridge, MA, USA: . p. 207-242 Abstract
Meta interpreters for real
Safra S. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 166-179 Abstract
Abstract Algorithmic Debugging
Lichtenstein Y. & Shapiro E. (1988) Proceedings of the 5th Logic Programming. Abstract
Image processing with Concurrent Prolog
Edelman S. & Shapiro E. (1988) Concurrent Prolog. Shapiro E. & Fuchi K.(eds.). Cambridge, MA, USA: . p. 339-369 Abstract
A subset of Concurrent Prolog and its interpreter
Shapiro E. (1988) Concurrent Prolog. Shapiro E. & Fuchi K.(eds.). Cambridge, MA, USA: . p. 27-83 Abstract
A test for the adequacy of a language for an architecture
Shapiro E. (1988) Concurrent Prolog. Shapiro E. & Fuchi K.(eds.). Cambridge, MA, USA: . p. 370-388 Abstract
An architecture of a distributed window system and its FCP implementation
Katzenellenbogen D., Cohen S. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 101-139 Abstract
CFLA concurrent functional language embedded in a concurrent logic programming environment
Levy J. & Shapiro E. (1988) Concurrent Prolog: Collected Papers. Shapiro E.(eds.). Cambridge, MA, USA: . p. 442-469 Abstract
The Panel on Theory and Practice of Concurrent Systems
Shapiro E. (1988) Proceedings of the International Conference on Fifth Generation Computer Systems, FGCS 1988, Tokyo, Japan, November 28-December 2, 1988. p. 152-153 Abstract
The panel on theory and practice of concurrent systems brings together researchers who approach the problems of concurrency from radically different angles. It is my hope that by exposing and confronting the different perspectives of the penalists all of us will enhance our understanding of what are the problems, concepts, current research directions, and the long term visions of this area of investigation. In this short note I will attempt to outline the perspectives represented by the penalist. This presentation does not attempt to convey the present ideas and positions of the penalists; these will be presented in the accompanying position papers, written concurrently with this note. Rather, it represents my subjective impressions of the research directions pursued by the panelists. The description is necessarily rough and terse, and since it was not screened by the other panelists, may contain impressions which are either wrong or not up to date. Specifically, the panelists may present a position different than the one ascribed to them in this note.
Detecting stable properties of networks in concurrent logic programming languages
Saraswat V., Weinbaum D., Kahn K. & Shapiro E. (1988) Proceedings of the 7th Annual ACM Symposium on Principles of Distributed Computing, PODC 1988. p. 210-222 (trueProceedings of the Annual ACM Symposium on Principles of Distributed Computing). Abstract
A significant motivation for programming language research is to find good abstractions and conceptual frameworks that enable us to understand and reason about certain kinds of complex computational phenomena in a simple way. In this paper we show that the specification of programs for the detection of stable properties of networks of communicating processes is facilitated - almost trivialized - in the framework of concurrent logic programming languages. We present the embedded short-circuit programming technique for detecting stable properties of networks, and illustrate its utility by providing a simple algorithm for distributed termination detection. We show that the runtime behavior of a concurrent logic program incorporating this technique (for diffusing computations) is similar to the "DSA" scheme described in [7] for maintaining distributed counters. We modify the technique for repeated detection of quiescence for phased computations such as discrete simulations. We show that the abstraction can be manipulated without reference to its implementation by presenting a simple solution to the distributed knotdetection problem [16]. Finally we show that, under certain conditions, a simple extension allows repeated (live) snapshots [3] of the state of the computation to be taken almost trivially, thus providing another technique for detecting stable properties.

1987

A layered method for process and code mapping
TAYLOR S., AVRON E. & Shapiro E. (1987) New Generation Computing. 5, p. 185-205 Abstract
The mapping problem has been shown to be computationally equivalent to the graph isomorphism problem; as such it is unlikely that a polynomial time algosrithm exists for its solution. Practical algorithms of general applicability and low computational complexity have not been found and are unlikely to appear.This paper describes a layered method to support specialised process and code mapping strategies. The method separates the task of mapping a problem to a virtual machine from the task of mapping a virtual machine to a physical machine. It allows multiple virtual machines to execute concurrently and many applications to run on a single virtual machine.The method is in use on a parallel implementation of Flat Concurrent Prolog which runs on an Intel iPSC Hypercube; for concreteness the description is based on this implementation.
Notes on the complexity of systolic programs
TAYLOR S., HELLERSTEIN L., SAFRA S. & Shapiro E. (1987) Journal of Parallel and Distributed Computing. 4, 3, p. 250-265 Abstract
This paper presents two notes which discuss basic theoretical issues concerning the systolic programming methodology and demonstrates simple techniques which can by used to structure communication. The first note shows how the complexity of two simple algorithms is adversely affected by the cost of data movement in a parallel system. Programming techniques which introduce pipelining are used to overcome the problem and improve the complexity. The second note shows an upper bound on the speedup which can be obtained using the methodology on a mesh connected architecture. This bound is stated in terms of the sequential lower bound for the algorithm concerned.
Compiling OR-parallelism into AND-parallelism
CODISH M. & Shapiro E. (1987) New Generation Computing. 5, 1, p. 45-61 Abstract
This paper suggests a general method for compiling OR-parallelism into AND-parallelism. An interpreter for an AND/OR-parallel language written in the AND-parallel subset of the language induces a source-to-source transformation from the full language into the AND-parallel subset. This transformation can be identified and implemented as a special purpose compiler or applied using a general purpose partial evaluator.The method is demonstrated to compile a variant of Concurrent Prolog into an AND-parallel subset of the language called Flat Concurrent Prolog (FCP). It is also shown applicable to the compilation of OR-parallel Prolog to FCP. The transformation identified is simple and efficient. The performance of the method is discussed in the context of programming examples. These compare well with conventionally compiled Prolog programs.
An architectural model for a flat concurrent prolog processor
Alkalaj L. & Shapiro E. (1987) Abstract
The authors propose an execution model and a special-purpose processor architecture for the execution of Flat Concurrent Prolog (FCP). The execution model defines concurrency inherent to the execution of FCP on a single processor. It is derived by partitioning the FCP Sequential Abstract Machine into concurrently executing units. To support this execution model, the FCP Processor architecture consists of the following concurrent functional processors: Reduction Processor, Tag Processor, Goal Management Processor, Instruction Porcessor and Data-Trail Processor. The Goal Management Processor performs the efficient management of concurrent FCP goals reduced by the Reduction Processor. The Data-Trail Processor implements a novel cache management algorithm which supports shallow backtracking. The FCP Processor architectural model is specified in FCP itself and is part of a working simulator. The attainable performance of the FCP Processor architecture is currently under investigation.
The Art of Prolog: Programming Examples - Macintosh (Logic Programming)
Sterling L. & Shapiro E. Y. (1987) (trueLogic Programming). Abstract
The Use of Logic Programming in Education
Scherz Z., Maler O. & Shapiro E. (1987) Eurit 86. p. 531-537 Abstract
This study is based on the belief that in the near future, computer programming using high level programming language techniques, should become an integral part of all school subjects, as reading, writing and arithmetics are today.A program designed for teaching Prolog (PROgramming in LOGic) as a first computer language for students and teachers at junior high and high school level is introduced. A unique approach of introducing basic concepts of logic programming within its proportional level is described.Applications of Prolog in different school subjects, and its potential to enhance logical and deductive reasoning and problem solving abilities are discussed.
Concurrent Prolog: Collected Papers
Shapiro E. (1987) Cambridge, MA, USA: . Abstract
The art of Prolog: advanced programming techniques
Shapiro E. & Sterling L. (1987) Abstract
A hypercube implementation of Flat Concurrent Prolog\u200f
Taylor S., Safra S. & Shapiro E. (1987) Hypercube Multiprocessors, 1987. p. 48-54 Abstract
Flat Concurrent Prolog is a simple, practical, concurrent programming language which has an efficient uni-processor implementation. This paper summarises the problems involved in implementing the language on a parallel architecture. The main concepts employed in an initial parallel interpreter are described; the interpreter has been implemented on an Intel iPSC Hypercube.Introduction. Flat Concurrent Prolog (FCP) is a simple, process oriented, single assignment language [l]. It provides an interesting framework to investigate parallel processing problems for a variety of reasons:

1986

Concurrent Prolog: A progress report
Shapiro E. (1986) Computer. 19, 8, p. 44-58 Abstract
Implementing parallel algorithms in Concurrent Prolog: The Maxflow experience
HELLERSTEIN L. & Shapiro E. (1986) The Journal of Logic Programming. 3, 2, p. 157-184 Abstract
This paper reports on the experience of implementing Shiloach and Vishkin's parallel Maxflow algorithm [7] in Concurrent PROLOG. The major difficulties in this endeavor were understanding the algorithm, which is intricate, and adapting it to the computational model of Concurrent PROLOG. In spite of the difficulties, we were able to produce a Concurrent PROLOG program that implements the algorithm and achieves the expected complexity bounds. The lack of destructive assignment in the logic program's computation model prevents PROLOG from being an efficient implementation language for many sequential algorithms. Our main conclusion is that, in concurrent algorithms, message passing is a powerful substitute for destructive assignment. It is therefore possible to write efficient Concurrent PROLOG implementations of concurrent algorithms.
Multiway merge with constant delay in Concurrent Prolog
Shapiro E. & SAFRA S. (1986) New Generation Computing. 4, 2, p. 211-216 Abstract
Multiway dynamic mergers with constant delay are an essential component of a parallel logic programming language. Previous attempts to defined efficient mergers have required complex optimising compilers and run-time support.This paper proposes a simple technique to implement mergers efficiently. The technique requires an additional data type and the definition of an operation on it. The operation allows multiple processes to access a stream without incurring the cost of searching for the end of stream. It is specified in Concurrent Prolog and is used to define multiple assignment variables using a monitor.The technique forms the basis for stream merging in Logix, a practical programming environment written in Flat Concurrent Prolog.
Concurrent Prolog: A progress report
Shapiro E. (1986) Fundamentals of Artificial Intelligence. Jorrand P. & Bibel W.(eds.). Vol. 232. p. 277-313 (trueLecture Notes in Computer Science). Abstract
Concurrent Prolog is a logic programming language designed for concurrent programming and parallel execution. It is a process oriented language, which embodies dataflow synchronization and guarded-command indeterminacy as its basic control mechanisms.The paper outlines the basic concepts and definition of the language, and surveys the major programming techniques that emerged out of three years of its use. The history of the language development, implementation, and applications to date is reviewed. Details of the performance of its compiler and the functionality of Logix, its programming environment and operating system, are provided.
A parallel implementation of Flat Concurrent Prolog
TAYLOR S., SAFRA S. & Shapiro E. (1986) International Journal of Parallel Programming. 15, 3, p. 245-275 Abstract
Flat Concurrent Prolog is a simple, practical, concurrent programming language which has an efficient uniprocessor implementation. This paper describes an initial parallel implementation of the language; it consists of an interpreter implemented on an Intel iPSC Hypercube. The parallel execution of concurrent logic programming languages involves many nontrivial implementation problems. Some of these problems are well known and have been treated extensively in the literature. The most difficult task is to integrate problem solutions in a coherent and efficient manner. The algorithm presented has been useful in providing insights into the major problems and includes a number of novel ideas to simplify implementation. It does not attempt to solve all the problems involved but rather provides a workable basis for current and future research. The algorithm is under ongoing refinement, simplification and improvement.
A New approach for introducing Prolog to naive users
Maler O., Scherz Z. & Shapiro E. (1986) Third International Conference on Logic Programming. Shapiro E.(eds.). Berlin, Heidelberg: . Vol. 225. p. 544-551 (trueLecture Notes in Computer Science). Abstract
Most Prolog textbooks are based on gradual presentation of Prolog's concepts, as well as some concepts from logic. We think that this order of presentation used in most books is inadequate for readers with little prior experience with logic, mathematics or programming. In this paper we introduce a new approach for teaching Prolog to these \u201cnaive\u201d users. Within our framework, the order of concept introduction is based upon the order used in teaching logic: from propositions to predicates. We believe that this approach will ease the difficulties encountered by newcomers to Prolog, and will provide a good basis for better understanding of logic, programming, data-base theory, and perhaps also other relevant disciplines.The choice of Prolog as a first programming language is also discussed, and we argue that it is a better choice than conventional languages in the same sense that those languages are considered better than machine languages. This claim however should be empirically verified.
Third International Conference on Logic Programming, Imperial College of Science and Technology, London, United Kingdom, July 14-18, 1986 Proceedings
Shapiro E. Y. (1986) Vol. 225. Abstract
Prolog, Fortgeschrittene Programmiertechniken
Sterling L. & Shapiro E. (1986) Abstract
Compiling OR-parallelism into AND-parallelism
CODISH M. & Shapiro E. (1986) International Conference on Logic Programming. Shapiro E.(eds.). Vol. 225. p. 283-297 (trueLecture Notes in Computer Science). Abstract
This paper suggests a general method for compiling OR-parallelism into AND-parallelism. An interpreter for an AND/OR-parallel language written in the AND-parallel subset of the language induces a source-to-source transformation from the full language into the AND-parallel subset. This transformation can be identified and implemented as a special purpose compiler or applied using a general purpose partial evaluator.The method is demonstrated to compile a variant of Concurrent Prolog into an AND-parallel subset of the language called Flat Concurrent Prolog (FCP). It is also shown applicable to the compilation of OR-parallel Prolog to FCP. The transformation identified is simple and efficient. The performance of the method is discussed in the context of programming examples. These compare well with conventionally compiled Prolog programs.

1985

Quadtrees in concurrent prolog
Edelman S. & Shapiro E. (1985) Proceedings of the International Conference on Parallel Processing. DeGroot D.(eds.). p. 544-551 (trueProceedings of the International Conference on Parallel Processing). Abstract
A Concurrent Prolog implementation of several quadtree-related algorithms is described. An optimal quadtree-building algorithm that uses a bottom-up approach is implemented, analyzed and compared to other parallel quadtree algorithms in the literature. Distributed algorithms for centroid coordinates computation and for connected component labeling are implemented and analyzed, as examples of synchronous and asynchronous parallel algorithms, respectively.
Polymorphic Arrays: An Architecture for a Programmable Systolic Machine
Fiat A., Shamir A. & Shapiro E. (1985) Proceedings of the International Conference on Parallel Processing. DeGroot D.(eds.). p. 112-117 (trueProceedings of the International Conference on Parallel Processing). Abstract
The problem of designing a systolic machine layout that can accommodate rectangular process arrays while guaranteeing local communication and even load distribution is considered. Several architectures for a programmable systolic machine are evaluated, and a new 'infinite grid' architecture, termed polymorphic arrays, is described. A polymorphic array of M processors can accommodate any rectangular process-array that contains M/ ROOT 5 approximately equals 0. 4472 multiplied by (times) M processes with no more than one process per processor, and this does not depend on the aspect ratio. In addition, the process vectors 1 multiplied by M and M multiplied by 1 execute with no processor assigned more than one process. The mapping of processes to processors is continuous and consistent. It can accommodate rectangular process arrays of all aspect ratios and all sizes with the maximum load-difference between processors bounded by a constant which is a logarithmic function of M. Polymorphic arrays are actually optimal in that no other consistent assignment of processes to processors can improve the bound of M/ ROOT 5 and the imbalance bound of log (M). A polymorphic array of arbitrary size has a two-dimensional realization in which the processor interconnection wire length is of length ROOT 8 multiplied by (times) C, where C is the width of a processor. A subset of polymorphic arrays, called regular polymorphic arrays, has a two-dimensional realization using nine local building blocks.

1984

Guest editor's preface
Shapiro E. (1984) New Generation Computing. 2, 4, p. 305-308 Abstract
Fair, biased, and self-balancing merge operators: Their specification and implementation in Concurrent Prolog
Shapiro E. & MIEROWSKY C. (1984) New Generation Computing. 2, p. 221-240 Abstract
The problem of allowing a dynamically changing set of processes fair access to a shared resource is considered, in the context of communication-stream based systems. It is argued that fair binary merge operators alone cannot solve this problem satisfactorily. Two solutions are proposed. One employs binary merge operators with a programmable bias; the other binary and ternary fair merge operators capable of self-balancing, using the concept of 23 trees. A Concurrent Prolog implementation of these operators is described. The implementation of the self-balancing merge operators illustrates the expressive power of incomplete messages, a programming technique that supports messages that contain communication channels as arguments. In the course of implementing the self-balancing merge operator, it was necessary to develop a distributed variant of the 23 tree deletion algorithm.
Alternation and the computational complexity of logic programs
Shapiro E. Y. (1984) The Journal of Logic Programming. 1, 1, p. 19-33 Abstract
We investigate the complexity of derivations from logic programs, and find it closely related to the complexity of computations of alternating Turing machines. In particular, we define three complexity measures over logic programs-goal-size, length, and depth-and show that goal-size is linearly related to alternating space, the product of length and goal-size is linearly related to alternating tree-size, and the product of depth and goal-size is linearly related to alternating time. The bounds obtained are simultaneous. As an application, we obtain a syntactic characterization of Nondeterministic Linear Space and Alternating Linear Space via logic programs.
Implementing Parallel Algorithms in Concurrent Prolog: The Maxflow Experience.
Hellerstein L. & Shapiro E. (1984) Unknown Host Publication Title. p. 99-115 Abstract
Fair, biased, and self balancing merge operators\u200f: Their Specification and Implementation in Concurrent Prolog
Shapiro E. & Mierowsky C. (1984) Proceedings of the IEEE International Symposium on Logic Programming. p. 83-90 Abstract
Systems Programming in Concurrent Prolog
Shapiro E. (1984) Proceedings of the 11th ACM SIGACT-SIGPLAN Symposium on Principles of Programming Languages. New York, NY, USA: . p. 93-105 (truePOPL '84). Abstract
Concurrent Prolog [28] combines the logic programming computation model with guarded-command indeterminacy and dataflow synchronization. It will form the basis of the Kernel Language [21] of the Parallel Inference Machine [36], planned by Japan's Fifth Generation Computers Project. This paper explores the feasibility of programming such a machine solely in Concurrent Prolog (in the absence of a lower-level programming language), by implementing in it a representative collection of systems programming problems.
Systolic Programming: A Paradigm of Parallel Processing.
Shapiro E. (1984) Unknown Host Publication Title. p. 458-470 Abstract
It is claimed that the systolic approach can lead to an algorithm design and programming methodology for general purpose, self-contained, high-level language parallel computers. This paper proposes a framework for realizing this potential, termed systolic programming. The framework comprises an abstract machine, a programming language, a process-to-processor mapping notation, and an algorithm development and programming methodology. An implementation of this framework using available technology is currently under investigation.

1983

Logic Programs With Uncertainties: A Tool for Implementing Rule-Based Systems
Shapiro E. Y. (1983) Proceedings of the 8th International Joint Conference on Artificial Intelligence. Bundy A.(eds.). p. 529-532 Abstract
One natural way to implement rule-based expert systems is via logic programs. The rules in such systems are usually definite clauses, or can easily be expressed as such, and the inference mechanbms used by such systems are built into the Prolog interpreter, or can be implemented in Prolog without much effort. The one component of expert systems which is not readily available in logic programs is a language for specifying certainties of rules and data, and a mechanism for computing certainties of conclusions, given certainties of the premises. Clark and McCabe [1] suggest an implementation technique for solving this problem. They augment each predicate in the rule-language with an additional argument, whose value is the certainty of the solution returned in this predicate, and augment the condition of each clause with an additional goal, whose purpose is' to compute the certainty of the conclusion of the clause, given the certainties of solutions to the goals in the condition of the clause. In this paper we propose a different way of implementing rule-based expert systems within Prolog, in which evaluation of certainties of solutions is carried out at the meta-level, within the logic program interpreter itself. This resulting framework, called logic programs with uncertainties, has the following properties: It is amenable to theoretical analysis. In particular, a precise semantics can be given to logic programs with uncertainties. Standard logic programs are a special case of logic programs with uncertainties. If all certainty factors are 1, then the semantics defined and the interpreters developed degenerate to the standard semantics and the standard interpreter for logic programs. Since the semantics of logic programs with uncertainties is simple, it is easy to apply the debugging algorithms developed in [3]. We consider the last point to be of great importance. Algorithmic debugging can provide the essential link between the expert and the executable realization of his knowledge, and thus facilitate the process of knowledge transfer and debugging. The paper defines the syntax and semantics of logic programs with uncertainties, develops an interpreter for such programs, and suggest how algorithmic debugging can be used by an expert to debug such rule-based systems. Examples are not included due to space limitations.
The fifth generation project a trip report
SHAPIRO E. (1983) Communications of the ACM. 26, 9, p. 637-641 Abstract
As part of Japan's effort to become a leader in the computer industry, the Institute for New Generation Computer Technology has launched a revolutionary ten-year plan for the development of large computer systems which will be applicable to knowledge information processing systems. These Fifth Generation computers will be built around the concepts of logic programming. In order to refute the accusation that Japan exploits knowledge from abroad without contributing any of its own, this project will stimulate original research and will make its results available to the international research community.
Playing Mastermind Logically
Shapiro E. (1983) SIGART Bull.. 85, p. 28-29 Abstract
I was intriqued by Mahadeva Rao's paper in SIGART #82 on an algorithm for playing mastermind, and its Pascal implementation. In this note I suggest a simpler algorithm for playing mastermind, and describe a miniature Prolog implementation of it.The version of mastermind we used to play as kids is slightly simpler than the one described by Rao, and needs less hardware (only pencil and paper). Player A chooses a secret code - a list of N distinct decimal digits (usually N=4 for beginners, 5 for advanced players). Player B makes guesses, and queries player A for the number of Bulls (number of digits that appear in identical positions in the guess and in the code) and Cows (number of digits that appear in the guess and in the code but in different positions) until there are N Bulls.
Object oriented programming in Concurrent Prolog
Shapiro E. & TAKEUCHI A. (1983) New Generation Computing. 1, p. 25-48 Abstract
It is shown that the basic operations of object-oriented programming languages creating an, object, sending and receiving messages, modifying an objects state, and forming class-superclass hierarchies can be implemented naturally in Concurrent Prolog. In addition, a new object-oriented programming paradigm, called incomplete messages, is presented. This paradigm subsumes stream communication, and greatly simplifies the complexity of programs defining communication networks and protocols for managing shared resources. Several interesting programs are presented, including a multiple-window manager. All programs have been developed and tested using the Concurrent Prolog interpreter described in.
Algorithmic Program DeBugging
Shapiro E. (1983) Abstract

1982

Algorithmic program diagnosis
Shapiro E. Y. (1982) Proceedings of the 9th ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, POPL 1982. p. 299-308 (trueConference Record of the Annual ACM Symposium on Principles of Programming Languages). Abstract
The notion of program correctness with respect to an interpretation is defined for a class of programming languages. Under this definition, if a program terminates with an incorrect output then it contains an incorrect procedure. Algorithms for detecting incorrect procedures are developed. These algorithms formalize what experienced programmers may know already. A logic program implementation of these algorithms is described. Its performance suggests that the algorithms can be the backbone of debugging aids that go far beyond what is offered by current programming environments. Applications of algorithmic debugging to automatic program construction are explored.

1981

Model Inference System.
Shapiro E. Y. (1981) p. 1064 Abstract
An Algorithm that Infers Theories from Facts\u200f
Shapiro E. Y. (1981) Proceedings of the 7th international joint conference on Artificial intelligence. Vol. 1. p. 446-451 Abstract
A framework for inductive inference in logic is presented: a Model Inference Problem is defined, and it is shown that problems of machine learning and program synthesis from examples can be formulated naturally as model inference problems. A general, incremental inductive inference algorithm for solving model inference problems is developed. This algorithm is based on Popper's methodology of conjectures and refutations [II] . The algorithm can be shown to identify in the limit [3] any model in a family of complexity classes of models, is most powerful of its kind, and is flexible enough to have been successfully implemented for several concrete domains. The Model Inference System is a Prolog implementation of this algorithm, specialized to infer theories in Horn form. It can infer axiomatizations of concrete models from a small number of facts in a practical amount of time.
Inductive inference of theories from facts
Shapiro E. (1981) 192, Abstract
This paper is concerned with model inference problems and algorithms. A model inference problem is an abstraction of the problem faced by a scientist, working in some domain under some ﬁxed conceptual framework, performing experiments and trying to ﬁnd a theory capable of explaining their results. In this abstraction the domain of inquiry is the domain of some unknown model M for a given ﬁrst order language L, experiments are tests of the truth of sentences of L in M, and the goal is to ﬁnd a set of true hypotheses that imply all true testable sentences.
Pases: A Programming Environment for PASCAL
Shapiro E., Collins G., Johnson L. & Ruttenberg J. (1981) ACM SIGPLAN Notices. 16, 8, p. 50-57 Abstract
PASES is an interactive, residential programming environment that supports the creation of PASCAL programs [Jensen 74]. Its major components are a full screen structure editor and an interpreter, both operating on the same internal representation of PASCAL structures. The system is in an experimental stage. It is partly implemented, and feedback from its partial implementation has resulted in rethinking its design. We describe the current design and development of PASES, having two goals in mind: one, to describe the design ideas that we have embedded in this system; two, to describe the lessons we have learned about programming environments, as a result of attempting to implement these ideas.