In the (good ol') days when FTP and archie were king, it was fairly simple for developers to spread their offerings far and wide. I had scripts set up to drop the right files in the right locations, and it didn't much matter if there were two or twenty archives.
Enter the Web, and the focus shifts from pushing software out to archives in favor of pulling people into Web sites. I think that's a good thing, because it puts more information into the users' hands, but in the process, developers have lost the ability to easily push anything out. Instead, we have to manually go to a number of tracking sites (the more the better, usually), set up accounts, and edit essentially the same information on all those sites.
My long-winded question is essentially this: Is there any interest in automating this process? I currently have a property list (easily made available in plist or XML format (and simple to convert to other formats, if necessary)) I can use to build the dynamic pages of my site which contains all or nearly all the information that is gathered at various software tracking sites. If a general software description file format can be agreed on, simply making that file available would give sites all the information they need to update their database entries. No fuss, no muss. Minimizing the administrative efforts will really lower the barrier of entry for all sites.
BetterConsole is my newest piece of software, recently released, which has brought this issue to the surface. You can find my SPIF files for it in two formats:
The plist format might not be particularly easy to parse on non-NeXT/Apple systems, so I would be willing to write a converter that puts out a format that is easier to parse.
Keep in mind that this is a work in progress and represents only a first pass effort at a format that contains sufficient information to satisfy most tracking software. Most tracking sites seem to consider a basic piece of software to have eight attributes: author, description, version, system, license, price, category, and package.
The author attribute identifies who owns the software. It does this via contact information by assigning values to the sub-attributes name (e.g., Joe Programmer) and location (e.g., http://www.someisp.com/~joe). Other sub-attributes such as email could also be added, though most tracking sites currently only seem to ask for the name and URL of the author.
The description attribute identifies the software in increasing levels of detail. Currently that is done with 3 sub-attributes: name, short, and long.
The version attribute identifies the version of the packaged software. This is done with 2 sub-attributes: revision, and status.
The system attribute identifies what operating systems the software is for. This is done with 2 sub-attributes: name, and version. It may be a good idea to make this an array instead, as some software may run on many different systems (the workaround would be to make a SPIF file for each system). It may also be a good idea to add further system requirements (RAM, HD, etc.), but that does not seem to be a major consideration from a tracking point of view.
The license attribute identifies the license the software is distributed under. This is done with 2 sub-attributes: name, and location.
The purchase attribute gives information on purchasing the software. This is done with 2 sub-attributes: price, and location.
The category attribute identifies how the software should be organized. In looking at the various tracking sites, there was really no consistency in the arrangement or naming of software categories. Additionally, most sites had additional fields for keyword descriptions of software (for search purposes). I'm hoping these features can be subsumed by this one category attribute. It should be considered a prioritized list of organization and search keywords. The software that scans the file should be able to look at this list and determine where it fits with all the other software that is being tracked. If it fails, I suppose it would be up to the tracker to either adjust the scanning software to be more robust or inform the developer of the error.
Leaving the most complicated for last, the package attribute identifies all the files that are associated with this software. Example sub-attributes are info, binary, and source. Each of these identifies a document that is related to this particular piece of software. That is done by further breaking that file information down to location, size, and checksum. I also included contact information, just in case the contact for, say, the binary might be different from the contact for the source, but I'm not sure that's really necessary.
[You can watch the original version of this document at http://www.subsume.com/spif/ for updates about the file format proposal. -- Ed.]
As Doc points out, there are two ways of getting your information through -- you can push it, or you can let people pull it. His proposal for letting sites pull the information from you makes it possible that you would only need to keep one piece of information up-to-date at all the sites on which you want your project listed -- the URL of your project info file. If you want your site listed on freshmeat, gnu.org, and kde.org, you could just give them the URL, and, at a regular interval, each site would have a bot check if the file has changed. If it has, it would compare the file with the information in the site's database and submit any differences as change requests to be reviewed by the site's staff. If the regular interval is taking too long, you could click a button on the site to ask it to check your info file immediately. You could even have an "info file URL" field in the info file, and you wouldn't have to go to the sites to keep the URL up-to-date. When you changed servers, you would put your files up at the new location and change the field in the file at the old location. The URL to the new info file would be submitted as a change request like any other. After allowing enough time for everyone to catch up, you could just remove the old files.
One big thing missing from his first draft is an "announcement" field. When you release a new version, you want it to show up on freshmeat's front page. Using Doc's scheme, you should simply be able to change the necessary fields in your info file, including one that lists what's new in this version, ready to appear in an announcement on freshmeat's front page, in the newsletter, on the newsgroups, etc.
This brings us to a problem I see in doing this. Ask any of the freshmeat staff, and they'll tell you that the number of items that are submitted and approved without any changes is quite small. Sometimes, there are errors in spelling and grammar. Sometimes, it's not clear what the contributor is trying to say, and we have to work it out with him or her. Sometimes, there are just changes that have to be made to make the submission match our editorial policies -- for example, we don't allow the name of the project to appear in the short description, and we insist that it appear in the long description (preferably in the first few words), we don't allow HTML in descriptions, etc.
Now, let's say you make changes to your info file, and we pick them up as change requests. What do we do?
Even when we get past that point and your info file is acceptable to us, you'll check your mail in an hour and see messages from two other sites saying that they need you to change x and y. When you change x, site number 3 will be unhappy, and when you change y, site number 1 will be unhappy. At this point, you'll wish you were just going to Web pages and filling out forms again.
The issues of what options to include in the file format can be overcome. Everyone who wants to take part in the system can get together and flame each other until they work it out. Dealing with policy issues and the editorial needs of all the sites is not as easy.
You might end up having tag attributes to accommodate different sites:
<announcement site="freshmeat.net">
(text acceptable to freshmeat.)
</announcement>
<announcement site="linuxdoc.org">
(text acceptable to the LDP.)
</announcement>
, etc. Whatever the solution, the problem would have to be dealt with. One size is not going to fit all.
Another idea that comes up from time to time is that of letting people submit information by email. Again, you would have a standard format for the information, only now it would be sent to the sites, where a script would parse it and submit the parts of it as change requests as needed.
The advantage to this is that you no longer have multiple sites trying to get you to change your info file to match their needs. They each receive your request and can contact you with any problems they have. You could have your XML info file in your build directory and have a rule in your makefile with a list of the addresses to which it should be sent and a command that will send it. Then:
make submit
I like that just for the coolness factor. :)
I have two questions for everyone:
Doc O'Leary (droleary@subsume.com) is a COG in the machine of Subsume Technologies, Inc. (http://www.subsume.com/). He is lazy, and has thus been an advocate of free software since 1996 and of object-oriented development for nearly a decade.
A C++ library with support for sockets and serial I/O and a gtkmm widget set.
Changelog missing
A common piece of information distributed with announcement (e.g. on the GNOME announcements mailling list), is a list of changes in 'bullet-point' fashion.
How about a 'changelog' element, with 'change' element children?
Dependencies! (mandatory and optionals)
It continuosly amazes me that there are so many projects putting up the instructions on how to decompress a tarball source, but without any information on dependencies (or buried deep within documentation, or scattered among several places).
What I'd like: a tag specifying build dependencies (i.e. on libraries) meant to be read by human beings. Example:
Building foo app requires:
libFoo > 1.2.x
libBar (optional)
libQuuux from CVS
etc.
This at least could simplify a little the life of package mantainers and integrators, and also the life of people building straight from sources.
Also, why limit to one category? mpg123 could qualify under "audio" (it plays mp3) as well as "console" (it's a console app) and "streaming" (because it can play audio streams from the net). This would eliminate the need of a deep tree, as actually Freshmeat has. Think of them as metatags for software search engines: unlike pr0n spammers, I am confident that developers can make intelligent choices on categories.
Finally, if you are searching good examples of what information you should include, have a look at the GNU Free Software Directory (http://www.gnu.org/gnulist/production/index.html).
About the uncompatibility between sites
This shouldn't be a problem, I mean, we are always writing code that works in diferents plataforms!
I see here to differents solutions:
The same solution we always use: writing an standard, in an RFC style about how freshmeat like sites should work.
The one the Freshmeat staff is talking about, write in the file an entry with one entry per site, this is would be a pain in the ass, but, it already is! is what we're doing right now! You might say, "hu, this won't be easier then!", and my answer is that is would be a little easier, we wouldn't need to go across all web sites filling the forms...
I rather prefer the first solution anyway...
Know, about what the Freshmeat staff is asking us to think about:
About fields currently available for a freshmeat appindex:
Freshmeat is like the main site for Open Source software, so, I would like to see a place in the site, where developers can ask for help in certain part of the project.What Freshmeat can do, is: mantain a TODO of the project and let Freshmeat users click on each item so they can see a better description or they can answer to that item to help with the project.
How long to poll...
The pull method, should probably include a field to indicate how often to poll. A slow moving project may be happy with once a week. A fast moving project, once a day. Also there should be a rule to determine when to mothball a project. If no updates occur in 6 months, email the developer. If no updates occur in 7 months, turn off automatic polling.
deb files?
It seems to me that if everyone would simply make debian packages we would have all the information about a package and also the ability to update easily.. or am i missing something?
Previous nearly-sufficient attempts
With some allusion to "Why doesn't everybody just build debians", there have been a few similar attempts to define the metainfo of a package so that it can be listed. I think this attempt is the closest. Why doesn't everyone just build RPMs? Why doesn't everyone just build Unix PKG files? SSOs? The list goes on.
To reverse the Debian question, why don't you just build Debian meta info from this file?
If this file had, in addition to lists of enclosed files, lists of where the file *came* from, permissions, and owner/group, then we could actually use this file to do a "make pkg" -- which is interpreted as "make a Debian" or "make an RPM/SSO/PKG/etc" depending on the system on which you're packaging your stuff.
Additions would have to be made for "add an inittab entry" or "add a System-V rc file" and such. This makes me think that a similar, generic "packaging" file is necessary, tightly linked to this file.
Not that problematic
Well, perhaps it is. One thing, though, only the initial release of a software package needs updates of most information. The announcements of new versions would only include:
Changes
New file location
New version number
Perhaps a bit more, but my point is that I can bear the burden of registering with many sites the first time, but later on when I perform an update, it gets tiresome to surf the web forms.
KISS
Responding to some of the issues raised
Push vs. Pull
When it gets right down to it, it's really all pushing for the developer and all pulling for the tracking site. The question for the tracking site is whether or not they want to use an interrupt pull or a (sorry about this . . . :-) poll pull. If I had a tracking site, what I would be inclined to do is, after accepting the initial SPIF submission, send an email with a URL in it that would easily allow the developer to click on it and force the update. In addition to being hard on the servers, I don't see polling as being particular desired by the developers; is anyone really willing to wait even a day before telling someone about their new software?
Missing attributes
I left an attribute for changes off because, to be honest, I really don't use it (same thing goes for dependency information). Logically, if I'm a new user of a piece of software, I care more about the descriptions of what the software is and how it is useful to me than what has been changed (i.e., I have no basis for evaluating the changes). If I'm a current user of a piece of software, just knowing that it has been updated is usually enough for me to either download it or read the updated documentation, or do nothing initially if I'm happy with the way this version is performing.
But I do agree that it would be handy to have in the SPIF file itself. I like the suggested changelog/change suggestion, an I'll add it to the SPIF document as a proposal. Dependency information probably needs more discussion as to what you actually refer to as the dependency (I'd be inclined to go with a SPIF URL).
Site requirements/corrections, invalid submissions, etc.
I fully expect sites to apply whatever standards verification they have on the submitted file, just as they would on a submitted form page. If the submission has problems, the developer gets a response from the web server (or an email if your server is actually doing polling) telling them to fix their damn file! If it's accepted but there is manual correction that needs to be done by site staff, I'd have the developer emailed those updates so that they can fix their file for the next submission. The tracking site is absolutely free to not accept problem SPIF URLs if the developer refuses to make corrections.
Conflicting site requirements
While this can become an issue in theory, I don't see it becoming one in practice. The SPIF format should be flexible enough to accommodate various site. If you come across an attribute you don't like, just ignore it. The category attribute list is one such example. The site simply goes through the list (considering it a hierarchy, if necessary) until it can match a listed category with a site category, and rejects the submission if there isn't one. The developer is free to put site-specific categories on that list without it affecting other sites. Hopefully all attributes can be so flexible (if necessary).
File format
Some are rightly concerned about the file format, and others have correctly noted that it's pretty easy to map between formats. The most important thing to me is that tracking sites start accepting submissions with some kind of non-manual entry. Ideally, we can work out a base format that is easy to parse and either use directly or convert to other formats. If you don't like the plist format or think XML (even a proper DTD) is overkill, it can be something flatter like:
author/name Subsume Technologies, Inc.
author/location http://www.subsume.com
category Application
category Console
category System
...
What's important is that it contains sufficient information. I think we're getting there, so it comes down to finding a base format that is expressive enough and easy enough to work with. I have no particular preference of my own. So my question to tracker sites like freshmeat and the developers who submit software is what kind of format are you willing to work with?
We should me making software HARDER to announce...
All too often, writing free software seems like a form of masturbation. Sites like freshmeat, sourceforge and advogato speak for the authors of free software, but nobody speaks for the users of free software. Until someone does so, free software will remain a niche market.
Freshmeat is the modern incarnation of shovelware. Tens of software packages are announced each day -- enough to overwhelm the user interface and require us to click many times just to read the descriptions of today's latest software. All too often, the description isn't very descriptive and I have to go to the homepage of the program to see if it's something I'm interested. About half the time, the home page is on source forge -- most sites on source forge don't have a single screen shot, a single page of documentation, or any explanation of why this IRC client is different from the thousands of other IRC clients that other people have written. It's so bad that it almost seems like it's forbidden to have screen shots, documentation or an explanation of what a program does on source forge.
A CTO who wanted to evaluate the software being promoted on Freshmeat could spend a few hours a day at this task, as would anybody involved in the free software community. At some point I realized that I had a choice
(1) I could write free software, or
(2) I could waste my time reading about free software on freshmeat
Because a large amount of low-quality software without documentation or even a paragraph explaining what it is already exists, I think it's actually harmful to make it easier to announce free software. Those who write 1% of a project, put it on source forge, and hope that the rest of us will finish it waste a scarce resource -- the time of people who are evaluating free software for use.
By creating barriers, such as knowing enough HTML to create a web page, or being able to set up your own web server or CVS server, we can ensure that the promoters of free software are people that are capable of COMPLETING a serious project.
Really, what we need a way to do is prevent people from announcing free software UNLESS they've written at least minimal documentation for it.
Push VS Pull and making software harder to announce
It would be much easier for maintainers of such databases (I keep such a thing for Mac OS X/WebObjects stuff.. Doc approached me about this a week or so ago) if this was a push situation rather than pull.
Push requires one person to notify one/several sites at once, and not every site to poll every developer daily. If we had a cross-platform submission method that would allow the developer to pick which sites to notify (updating the list from a central source automatically) that would be useful.
Making Software Harder to Announce
I'm also frustrated by the lack of thought that is often the case when software is announced. However I think rather than making this harder to do, the individual sites need to perform some level of moderation, perhaps scoring first submissions appropriately.
re: scoring
i think (registered fm) users should be able to vote on three things: documentation, stability, features (many or few could be good or bad). then the users could revise their score for a program if another release is better or worse. this would help me as a developer also so i could fix things because i never really get any feedback at all. i think most users get the impression that _all_ developers are very busy and are insulted to get mail from users. if someone doesn't like something i wrote, i want them emailing me why so i can do something.
make submit
make submit should be ``make petition'' or ``make announcement'' (harder to type) as it's closer to real-world language and scans better. You are petitioning the announce sites to publish your announcement/update.
Pulling content
Why not go half-way at least? The announcements could be a new gnu (heh) standard file (ANNOUNCE) in the project directory with the same concept as ChangeLog, except the more human-centered version. This way, an author could _either_ enter an announcement _or_ point their freshmeat.net entry to their web/ftp-uploaded announcement file. Freshmeat would grab the most recent announcement and treat it the same way they currently are (as though the author had typed it in).
Copy the Media!
They copy one another. Web sites should also talk to each other...
So you could write a web service to allow a person to enter the necessary information, and then let the server make submissions to all the other news sites! Or, for a pull technology, it could accept "what has changed since date x" requests from participating free software sites. This would result in much less overall bandwidth being used.
KISS - keep it simple, stupid
I don't think it's necessary to discuss a standard or
interchangebility of the information. What is instead
a nuisiance is the login process after which
I am presented with the fields I have to change then.
"make announce" sounds cool anyway - to make it work
I do want freshmeat to send back a form sheet of the
latest announcement. The next time I am around to make
a re-announcement, I can simply adapt the fields as
long as the syntax is intuitive. And as long as it
is intuitive, I don't care about a specific
format or conversion tools - just edit the thing for
each site as they want it, as long as I am able to
cut and paste on my local computer. It
would even be sufficent if I would be allowed to
paste the form-information online into a single field.
Still I have to login - an authentication matter.
The authentication would be a bit complex with e-mail
however (I still don't have a pgp-key). Well, no need
for that if you go with the url+pull+trigger idea -
this means that I just setup a website once (which would
need authentication there), and then I push the announcement information in there - just a file anyway - and
simply send a trigger impuls to freshmeat so they look
up the url they have been made to know. And the trigger
impuls is as simple as a "make announcement-call". And
please, no need for micro-emails here - just make it
a special cgi-url at freshmeat to trigger the pull, there
are enough commandline utils that can do http-get.
pole rates
some one mentioned that if a project wasn't updated in a while it would eventualy be abandoned. Another idea was to specify how often to check back in the info file itself. I would sugest making the server figure it out itself. Each time it checks to find no update it can increase the wait interval. This would make it adapt to the update rate and would be nice because often when a project is started it has freqent changes and after it becomes more stable the updates are less fequent.
Trying to fix the mess in those packages...
I see one other piece of information which may be interesting.
Take for example the program MySQL, which has one appindex record in freshmeat (that makes perfect sense as the average user would see MySQL as one single unit, maybe two if you consider the client and the server side).
Check now the related Debian information (Debian is only a sample here, you could probably do the same for RedHat or Mandrake). The files which 'belong' to MySQL in Potato are :
- mysql-gpl-doc
- mysql-manual
- mysql-gpl-client
- mysql-doc
- mysql-server
- mysql-client
For some packages you can have libs, dev-packages, source packages... Which is enough to create a huge confusion in beginner's minds.
We're still talking about packages. Freshmeat has an idea of packages which is close to the user's idea of packages yet distributions have an idea of packages which is more programmer-oriented.
In my point of view it would be *very* interesting to have a link from the concept 'MySQL' to all the distribution packages which correspond to this concept, and even a link to other concepts like 'kmysql-php' which just don't make sense without MySQL. That's more or less the 'dependency' idea applied to packages.
We gotta keep on moving on this, it's an important step in merging the different packaging systems for Linux. It may still be an utopia but it's so important for users.
Am I the only one to think this way ?
NO ! NO POLLING !! This is better IMHO ...
Polling is of course an evil thing!
A =MUCH= better idea IMHO, is that when you have a release, you take your description file and email it to the sites. Each announcement site would have an email account that just grok'd the file and updated the database. Then the authors only have to add the sites to an email list/group and send it out to all of em at once. This would stop silly polling while allowing the author to send out changes with a simple file-attach.
Also, to make sure we don't fake emails, I think a GPG/PGP signature would be called for. I know some people are gonna fuss if required, but I really think authenticated/signed email is gonna become a necessity anyway, might as well get used to it.
Those that don't like signatures would probably have to go to the site and manually log-in and then upload the description file, still a bit faster than going all over. It would also be nice if some of the major sites allowed the others to request the description files from their database by date and/or type or allow remote SQL searches.
I do like the idea of a standard format description file, and I do like the idea of adding dependency information up front. In fact, when browsing freshmeat, you should be able to click the dependency so you know where to get it if you don't have it.
Anyone else hate downloading something, finding out you need another file, and then going back on the net to find it. Then you download it compile it, install it, go back to what you were trying the first place, and then find another dependancy you need ... ARGH!