User:Cscott/Ideas/Integrating MediaWiki

Source: Wikipedia, the free encyclopedia.

Installing a fully-functional MediaWiki instance has become difficult. Users installing their distribution packages or following our installation guide end up with only a bare-bones fragment of our software stack, missing most of the useful services and extensions which WMF has developed over the past decade. This is largely a feature, not a bug: our MediaWiki extension system has been very successful, and has allowed both WMF and third-parties to develop a large number of useful features loosely coupled to our core, which has been able to remain relatively small. But the core should not be confused for a full MediaWiki install.

The first step to remedy the situation is to acknowledge third-party users of MediaWiki as a first-class Audience, so that we can devote the proper resources to their support. Our new "External Wikis" team could begin by sifting through Special:Version and identifying an expansive set of "standard" extensions and features, omitting only code which is highly WMF-specific (such as fundraising, messages, or internal metrics), appropriate only to very-large-scale wikis (PoolCounter?), or deprecated/abandoned (EasyTimeline?). But the large majority of the extensions running on WMF wikis should be included.[1][2][3] The External Wikis team would be evaluated on the number of external contributions to our stack and the number of external users running our "standard" extension set.[4]

We should then devote effort to allowing these extensions to be downloaded and installed with little effort. The standard installation guide should include the installation of these extensions, they should be distributed with default configurations which "just work", and any "special setup" required should be addressed. This work may include packaging container-based solutions for installing a "standard" wiki, such as those based on vagrant, docker, or kubernetes. But it should also include refactoring "service" components for easy installation: by default services should also unzip into the core extensions directory and ship with "works by default" configurations.[5] Any required platform packages should be listed in a standard format.[6] A special service runner built into core will take care of forking any required long-lived processes (in the same way that our basic Scribunto install uses a forked Lua interpreter). Advanced configurations could use embedded or distributed services, but the default install will painlessly support single-server installs. In addition to our current service implementation languages (Java, node.js), decoupled services could even be written in "older" or "newer" versions of PHP as future needs warrant.

The integration of services with the standard MediaWiki installation process will extend to localization and internationalization. The same MessagesDirs mechanism used in extension.json should allow localization of messages used by services as well, well-integrated with translatewiki and our other language infrastructure.

As part of this service integration work, the URL routing mechanism of the https://xx.wikipedia.org/api/rest_v1 API should be brought into core, and integrated such that new REST modules may be installed the same way any other MediaWiki extension is installed: by unzipping into the extensions directory (even if they are implemented in JavaScript). Some modules may eventually be rewritten in PHP for tighter integration, but no RESTbase modules will require a rewrite in PHP in order to be packaged as an extension. The implementation language choice will be independent. A small lightweight "service runner" can be provided to allow running these extension-packaged REST modules outside the MediaWiki framework; ops may even use this in production to allow bypassing PHP request routing overhead for certain request paths.[7]

In addition to ensuring that "standard" extensions and services are installed and configured by default, we should renew our focus on making our content reusable by third parties. Specially, templates, modules and gadget on WMF projects should allow easy reuse across wikis (for example, using Shadow Namespaces). We should allow third parties to reference property and item definitions from Wikidata in their own wiki installations, using the Wikibase client.[8] This goes further toward allowing external wikis to "work like Wikipedia does" out of the box.

The wikitext parser will be factored out of MediaWiki core, as described in the "zero parsers in core" proposal.[9] The existing legacy PHP wikitext parser will be moved into an extension. Parsoid will be repackaged as an extension using the new Parser interface; initially without a rewrite in PHP, so the Parser API will communicate with Parsoid running in node.js as before. Other trivial implementations of the Parser API may be created, such as a markdown parser or a HTML-only wiki module, to demonstrate the full decoupling of core from wikitext. As a follow-up, an implementation of Parsoid may eventually be done in PHP using the new Parser API, but this rewrite could fail and is not on the critical path.

The existing "storage engine" functionality of RESTbase (only one of the many modules currently underneath the REST API) will be reimplemented on top of Multi-Content Revisions. The multiple databases corresponding to our multiple projects will also be merged, facilitating cross-project features like simultaneous display of parallel texts in multiple languages. In-progress edits (editor switching, conflict resolution, content translation) will be stored in the main database, for example in the user namespace.[10] This will unify all of our storage in a single database layer and eliminate the need for Cassandra. This should simplify ops and (hopefully) reduce storage costs by eliminating some redundancy.

We will lower the barrier-to-entry for third-party developers and erase some of the hard boundaries between template, scribunto module, gadget, extension, skin, and core code. For web, the Marvin prototype will be continued, along with the development of a special "null skin" for core which would allow the existing PHP code in core to serve special pages and other bespoke UX as unwrapped HTML, which Marvin can clothe with an appropriate UX and skin. On mobile we will continue to move Android and iOS app code from native languages (Java, Swift) into PHP and JavaScript to enhance code reuse.[11] In core we'll continue to research the potential of projects such as php-embed and v8js to further blur the lines between server-side PHP and JavaScript. For editors Scribunto/JavaScript will also be completed, allowing the creation of template code in JavaScript. In so far as is possible, the same APIs will be available in all four contexts. The ultimate goal should be to allow the creation of a full skin in JavaScript, templates in JavaScript, and the implementation of extensions and special pages in JavaScript.[12]

This proposal would commit WMF resources to supporting a more complete "standard" distribution of MediaWiki on both single-server and containerized platforms. By standardizing the configuration and installation mechanisms for services we would retain the benefits of a decoupled architecture without falling into configuration/dependency hell; it would also expand the number of third parties able to run our services and contribute to their development. Decoupling wikitext from core would allow a greater amount of markup independence and clear the way for future innovation in the wikitext representation, leveraging the successful Parsoid innovation of round-trip conversions to allow editors to use their choice of visual or text-oriented markup editors. Moving to HTML-native storage for articles will also benefit performance and clear the path for future improvements such as incremental rendering and subtree editing. Finally, the embrace of JavaScript as an official "second" language for the project beside PHP will expand our developer base; embracing JavaScript for templates would allow expanding our editor base. Decoupling the UX from the PHP core would unleash further innovation in our presentation layer and allow us to create modern reactive user experiences.

  1. ^ Especially: Echo, VisualEditor, StructuredDiscussions, MultimediaViewer, Scribunto, ParserFunctions, Cite, SyntaxHighlight, TemplateData, TimedMediaHandler (and other extensions for broad media support), Wikibase Client (pointed at WMF like InstantCommons), Citoid, MobileFrontend (mobile support "as good as WMF" should be a standard install), OATHAuth (for 2FA, which should be a standard install), Thanks (and similar extensions to make the world a friendly more helpful place). Translate (and similar extensions to keep our place as "best wiki software for multilingual collaboration"). and the various "global" extensions to make it as easy as possible for wikis to reuse content from WMF.
  2. ^ Looking at mobile support, if we are investing the effort in making a good mobile experience (apps & mobile web), then it should be something which the broader public users of MediaWiki should also be using.  That is, we shouldn't be building one product "for WMF" with all the "good stuff" and forcing the rest of the world to use something else: even if that something else is "good enough" for many users, it's diluting the developer and user base and eroding "with enough eyes all bugs are shallow", etc. If we build native apps, why wouldn't external users like NASA want to build a special "NASA App" to access their internal wikis, based on our code but with a few configuration changes?  (And then maybe we can start to get code contributions back!) If WMF uses MobileFrontend but other wikis are supposed to use, say, a responsive skin coupled with TemplateStyles, then we're not working on the same project and can't collaborate with our users. We all should be eating the same dogfood.
  3. ^ And consider Wikidata: one of our major projects which practically speaking isn't installed at all outside of the WMF. A third-party wiki should be able to use a local Wikibase to write statements of the form "<my local image> <is a picture of the grave of> <Douglas Adams>", where the 2nd term ("property") and 3rd term ("item") live on Wikidata. Otherwise, the work required to create a full ontology and set of properties from scratch is overwhelming. This requires fixing some early design decisions that make the Wikibase client require direct database access.
  4. ^ That is, at a high level we're not going to identify which of our external audiences is "worth supporting" or "optimally mission aligned" and support only those configurations; instead the focus is on the ultimate goal (getting our code used, and getting contributions back), objectively scored by gerrit commits and WikiApiary data. This metric is how this team relates to the WMF mission statement: every user who learns how to use "the wiki at work" is a user primed to edit Wikipedia (when it is running MediaWiki in essential the same configuration as the WMF); every code contribution made to "the wiki at work" can be an aid to the core Mediawiki software.
  5. ^ See phab:T133320 for how this might work for one particular service.
  6. ^ Consider how travis lists required dependencies for example. We could have .debian, .ubuntu, .fedora, etc files that list package dependencies in a machine-readable format that can automatically be tested (and thus kept up-to-date) with a containerized integration tester. A special script in maintenance could unzip a new extension and sudo-install its required dependencies in one step.
  7. ^ An analytics-based service (pageview API, for example) may wish to use a routing piece that is completely decoupled from MediaWiki; the overhead of packaging a REST component as an extension should be lightweight enough to easily allow the use of extension-packaged modules with a simple service runner for standalone services.
  8. ^ Currently this requires direct access to the WMF database server, so is impossible outside the WMF.
  9. ^ See phab:T114194 for a proof-of-concept.
  10. ^ Canonicalizing storage of "edited version of article X at revision Y" with standard APIs would also allow more robust fork/merge features, including improving the Draft namespace and prototyping a github-like collaboration model which defers merge rather than imposing a quick revert on new editors.
  11. ^ We should continually try to minimize the amount of "things done in native land" in order to enhance code reuse between all our platforms. The whole point of the apps should be to do stuff which is really native and specific to the platform—esp platform integration features—so the goal should be zero duplicated code between Android, iOS, and the mobile web view.  If we find ourselves re-implementing some feature (say, offline support?) multiple times on different platforms we should take a hard look at whether there are techniques, such as moving more of the implementation server-side or into JavaScript, that would let us Write It Once. Our native clients should embrace the special features of the platform (otherwise why would folks use them?) but otherwise be as thin as possible.
  12. ^ Of course, the first adopters of JavaScript templates or JavaScript extensions will probably be small bold third-party or small-language wikis. Adoption of these features on WMF wikis will likely proceed conservatively, and I'm not advocating an immediate effort to port existing extensions and templates. But the External Audiences team can create training materials using the new features, proving on-ramps for new contributors that impose fewer "now switch programming languages and learn a new API" roadblocks.