• Please review our updated Terms and Rules here

PHP/JavaScript programming help wanted for manx

legalize

Experienced Member
Joined
Mar 24, 2006
Messages
391
Location
Salt Lake City, UT, USA
You may know about the manx documentation database. What you may not realize is that the entire implementation of this web site is open source on github.

I work on it when I find slices from my bucket of "copious spare time".

After VCFMW there was some discussion of making some improvements to manx. I will eventually get to them, but it would go faster if some other people were to help out.

The code is heavily unit tested and in the past week or so I've been modernizing the code to PHPUnit 6, which simplifies some of the testing.

The future needs are not particularly difficult, but as I say it would go faster with some help.

Take a look at the github repository and if you think you're up for it, let's talk about how you can help.

Thanks!
 
I've been doing a bunch of work on the develop branch recently. In general, the PHP community infrastructure has advanced significantly since I started coding this application around 2010. When I started coding this, the PHP community didn't have a package manager and finding high quality libraries for writing a PHP application was difficult if you didn't work in this space on a daily basis. I've spent some time catching up to "community norms" for PHP applications and have tried to make the code base more modern and normative instead of home brew and full of NIH/DIY-isms.

The next release will be 2.0.7, a minor release containing some bug fixes and one feature: automatic ingestion of most bitsavers docs.

Here's an overview of work that's been done on develop for the next release:
  • Added travis-ci.org continuous integration builds that validate that the tests stay working (modernization)
  • Switched to using composer for PHP package management (modernization)
  • Updated to use PHPUnit mock objects instead of hand-crafted test doubles in the unit tests (modernization)
  • Eliminated a bunch of code and database duplication between processing of bitsavers.org and chiclassiccomp.org IndexByDate.txt files for automatic ingestion (refactoring)
  • Switched from home-brew dependency injection to using Pimple\Container (modernization)
  • Resurrected all the open issues from the old codeplex project and injected them into github (open bugs, community)
  • Fixed some small bugs that were open on codeplex (open bugs)
  • Added a wiki page that gives an overview of the code architecture (community)
Remaining work to do:
  • I have a little bit more testing and bug fixing to do on develop (mostly around the IndexByDate processing) (refactoring)
  • Ingest changes to IndexByDate files more efficiently (performance)
  • Automatically ingest documents from IndexByDate files (feature)

The main thing that is lacking in the application right now is editing existing data when authenticated as a logged-in user. This will be more important with automatic ingestion of bitsavers documents based on the IndexByDate.txt processing.
I'm looking at the Slim framework to make it easier to write pages for editing data. In general, I think the way to move forward with new features is to expose REST API endpoints through Slim and then have front-end JavaScript that performs the operations via AJAX calls. Slim also has facilities for rendering pages from templates, which will make it easier for writing new pages in a sane manner. Some edit pages and corresponding endpoints will end up forming the next release, tentatively named 2.1.0.
 
Last edited:
A basic implementation of automatic document ingestion has been created and is undergoing testing. It still needs some tweaking to identify documents whose titles begin with something that looks like a part number.
 
I've been testing the automatic ingestion and I need to improve the recognition of part numbers from the URLs. Because different companies use different forms of part numbers, I'm thinking of a set of regexes associated with a company and/or subdirectory to match part numbers from file names. If the pattern matches, then it's assumed to be a part number and is used for adding the document. Otherwise, I'm just getting too many bogus documents added.

As a result, it looks like 2.0.7 won't ship this weekend, maybe next weekend. When developing heuristics like this, you just have to test repeatedly against the typical dataset so you can improve the heuristic. Due to the open-ended nature of the test/revise cycle, it's hard to say when it will be good enough to deploy, but we're getting much closer.
 
Work has been a little crunchy lately, so 2.0.7 is making progress, but has slipped a few weeks. It turns out that ingesting documents from bitsavers IndexByDate.txt is possible, but with lower accuracy than I would like, so will need some improving over time by building up heuristics for various directories in bitsavers. This likely means that a human will curate a pattern for a directory allowing all the docs in that directory to be ingested, instead of curating the ingestion of each individual document.

I'm going to keep the limited ingestion I have now as part of 2.0.7, fix the remaining open issues for the 2.0.7 milestone and then work on improving the automatic ingestion for 2.1.0.
 
A bunch of smaller issues were fixed today, bringing 2.0.7 much closer to release. One minor enhancement and one bug fix left and 2.0.7 will be ready for deployment.
 
OK, all the issues relating to milestone 2.0.7 have been completed and the development branch has been merged to master. I'll probably deploy this to the production server this weekend. Basic automatic ingestion from bitsavers is present, but I'm not happy with the results of the heuristic when deployed globally. So I'm going to continue to improve this for 2.1.0 to increase the accuracy of the extracted metadata.
 
Version 2.0.7 has been deployed and is live.

I'm continuing development with milestone 2.1.0. There are two main goals: 1) improve the heuristics of automatic document ingestion from bitsavers and 2) make the site mobile friendly.

I added goal 2) when browsing the site on my phone and noticing that the font sizing was pretty hideous and not very useful :)
 
I've split some work from milestone 2.1.0 to produce milestone 2.2.0 and milestone 3.0.0. This will let me deploy improvements to the document curation process more quickly.

Milestone 2.1.0 improves the document curation user interface to make it more amenable to manually curate documents from the IndexByDate.txt file. This was basically a feature request from Silent 700 some time ago. (ChiClassicComp also creates an IndexByDate.txt like bitsavers to aid in assisted/automatic document ingestion.)

Milestone 2.2.0 will be improvements for automatic document ingestion. (This is all busted right now anyway because classiccmp.org/bitsavers.org blew up.)

Milestone 3.0.0 will focus on mobile friendly rendering.
 
Back
Top