Issue276

Title Flickr stats move to database (out of CSV)
Priority bug Status in-progress
Project Metrics Milestone
Superseder Nosy List mlinksva, nyergler, paulproteus
Assigned To Keywords

Created on 2009-04-07.05:55:38 by paulproteus, last changed 2010-07-19.17:44:28 by nyergler.

Messages
msg1039 (view) Author: paulproteus Date: 2009-04-30.23:40:21
Just updated the nightly ML-based estimation code to read from the database.

Next step: Update CSV dumping code to dump this data nightly.
msg1037 (view) Author: paulproteus Date: 2009-04-30.22:57:05
Just updated the nightly cron job so that it inserts into the DB.

Next step: Update nightly ML-based estimation code to read from DB.

Next next step: Update CSV dumping code to dump this data nightly.

Next next step: Migrate nightly CSV-writing code to write to this.

Next next next step: Create a new CSV writer from this DB.
msg1026 (view) Author: paulproteus Date: 2009-04-30.18:48:44
Migrated existing CSVs.

Next step: Migrate all CSV reading code to this.

Next next step: Migrate nightly CSV-writing code to write to this.

Next next next step: Create a new CSV writer from this DB.
msg1019 (view) Author: paulproteus Date: 2009-04-29.19:51:53
Table created (er, I lost half an hour to the fact that utc_timestamp is a
reserved word in SQL apparently, so I struggled to figure out why I couldn't use
it as a column key).

Next step: Migrate existing CSVs into the Flickr DB.

Next next step: Migrate all code that uses those CSVs to jam this into the DB.
msg982 (view) Author: mlinksva Date: 2009-04-23.01:09:52
I like the 2nd table.
msg981 (view) Author: paulproteus Date: 2009-04-23.00:58:38
ACK, in progress

Mike, right now we have two tables: "simple" which stores linkback results, and
"complex" which stores the results of Google and Yahoo API queries for
CC-licensed content (which we never ever use).

I was thinking of adding a third table whose structure mirrors "simple", but
instead I could be convinced to jam the Flickr data into "simple".

The structure of simple is:

simple (
    id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
    license_uri VARCHAR(255)  NOT NULL,
    search_engine VARCHAR(255) NOT NULL,
    count INT NOT NULL,
    timestamp DATETIME NOT NULL,
    country VARCHAR(255),
    language VARCHAR(255)
);

We could jam the Flickr data in by setting search_engine='Flickr', and
country=NULL and langauge=NULL. One reason I don't like that approach is that
the Flickr results are more canonical than the search engine ones. For that
reason, I favor a second table (in the same database):

site_specific (
        id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
        license_uri VARCHAR(255)  NOT NULL,
        site VARCHAR(255) NOT NULL,
        count INT NOT NULL,
        timestamp DATETIME NOT NULL,
);

ML, feel free to write "don't care", but I wondered: when you wrote "added to
the main stats database," did you mean adding it to the same table? If you don't
mean that, I'll quickly start a fresh separate table.
msg702 (view) Author: paulproteus Date: 2009-04-07.05:55:38
Writes Mike:

> Flickr data should be added to the main stats database (keyed with 2.0
> generic URIs), as should any other site specific data we can gather in
> the future.  We can regenerate historical estimates at any time based on
> the latest historical data (possible some for other sites will turn up).
History
Date User Action Args
2010-07-19 17:44:28nyerglersetassignedto: paulproteus ->
2009-04-30 23:40:22paulproteussetmessages: + msg1039
2009-04-30 22:57:05paulproteussetmessages: + msg1037
2009-04-30 18:48:44paulproteussetmessages: + msg1026
2009-04-29 19:51:53paulproteussetmessages: + msg1019
2009-04-23 01:09:52mlinksvasetmessages: + msg982
2009-04-23 00:58:38paulproteussetstatus: unread -> in-progress
messages: + msg981
2009-04-22 17:09:21nyerglersetassignedto: paulproteus
2009-04-07 05:55:38paulproteuscreate