几个小技巧

首先是关于 git 的。

git subtree

这个可以到 github 上去找到,之后 copy 到 git 的 libexec 下面以及对应 man/1 下面的文档就 OK 了。主要作用是对现有 project 进行分化。下面是实现这个的用法:

git subtree split --prefix=some/prefix --annotate="(split)" -b some-branch
git push ssh://some.host.com/some/path/some.git
git subtree pull --prefix=some/prefix ssh://some.host.com/some/path/some.git master

第一个将 subtree 放在 some-branch 里面,第二个将其推送到远端,第三个从远端同步回来。

万一失手,git reset 总是最好的帮助,甚至可以写 git reset –hard master@{“20 minutes ago”} 或者 master@{2} 这种表示回退的概念的。比较有用的是 git reflogs 可以察看如何 reset。如果需要取消 reset 那么可以对其前面的 reset。如果时间不久,那么可能 git 尚未清理掉这些数据,因此可能挽救回来。

后面考虑使用 git svn。

opencv

有两个函数 imdecode/imencode 用于将 memory 里面存放的文件解析出来。下面的代码算是复习一下 STL 了。

#include <fstream>
#include <vector>
#include <algorithm>
#include <iterator>
#include <iomanip>
#include <opencv2/opencv.hpp>

int
main (int argc, char* argv[]) {
  std::ifstream in (argv[1], std::ifstream::binary|std::ifstream::in) ;
  if (!in.is_open ())
    return 1 ;
  // must have!
  in >> std::noskipws ;

  // copy to buffer
  std::vector<uchar> in_buf, out_buf ;
  std::copy (std::istream_iterator<uchar> (in),
             std::istream_iterator<uchar> (),
             std::back_insert_iterator<std::vector<uchar> > (in_buf)) ;
  // decode to cv::Mat
  cv::Mat im = cv::imdecode (cv::Mat (in_buf), 1) ;
  // encode to std::vector
  cv::imencode (".jpg", im, out_buf) ;
  // copy to ostream
  std::ofstream out (argv[2], std::ofstream::binary) ;
  std::copy (out_buf.begin (), out_buf.end (),
             std::ostream_iterator<uchar> (out)) ;

  return 0 ;
}

scrapy

很好爬网页的东西,这里简单的写个爬 360buy 的 BaseCrawler 做例子。后面准备扩展一下

from scrapy.spider import BaseSpider
from scrapy.selector import HtmlXPathSelector
from mydeals.items import Deal
import re

class jd_spider (BaseSpider):
    name = '360buy'
    allowed_domains = ['360buy.com']

    start_urls = [
        'http://tuan.360buy.com/beijing-0-0-1-0-0-index.html'
    ]

    def parse (self, response):
        deals = list ()
        hxs = HtmlXPathSelector (response)
        # all divs has id = deal-intro
        divs = hxs.select ('//div[@id="deal-intro"]')
        for div in divs:
            d = Deal ()
            # name/url is in the second anchor in h1
            t = div.select ('h1/a[2]') ;
            d['name'] = t.select ('text()').extract ()[0].strip()
            d['url']  = 'http://tuan.360buy.com/' + t.select ('@href').extract ()[0]
            # the price has deal-price class
            d['price'] = div.select ('descendant::p[@class="deal-price"]/strong/text()').extract ()[0]
            # find the original price with del
            t = div.select ('descendant::del')
            d['rprc'] = t.select ('text()').extract ()[0]
            # the discount is the sibling td
            d['disct'] = t.select ('parent::*/following-sibling::*[starts-with(@id, "team-discount")]/text()').extract ()[0]
            # the expire time can be computed from deal-timeleft class
            t = div.select ('descendant::div[contains(@class, "deal-timeleft")]')
            d['exptm'] = long(t.select('@curtime').extract()[0]) + \
                long(t.select('@diff').extract()[0])
            # address
            t = div.select ('descendant::div[@class="looa"]')
            if len(t) > 0:
                d['addr'] = t.select('div/text()').extract()[0]
            else:
                d['addr'] = '北京'
            deals.append (d)
        return deals

——————
And he went in to Hagar, and she conceived: and when she saw that she had conceived, her mistress was despised in her eyes.

Advertisements
几个小技巧

一个有关“几个小技巧”的想法

  1. zt 说:

    对 git subtree 的一点补充,还是发现这个东西不好用,后来搜到一篇 blog,现在沿用它的 workflow,
    http://psionides.eu/2010/02/04/sharing-code-between-projects-with-git-subtree/

    就是不能直接把某个 directory split 出去,而是先创建一个单独的 repository,然后 subtree add 将其加回来。为了保留原先的记录,可以先 subtree split 将这个子目录 annotate 好并建立一个单独的 branch,这时从一个 bare repository(不必要一定是 bare)pull,或者 push 到这个空的 repository 里面。但是如果只是这样做两边似乎无法同步(有待进一步验证)。

    于是我们将新的 repository 变成原 repository 的 tracking remote,这只需要 fetch 一下就 ok,然后 rm 掉原先子目录,并且 subtree add 将 tracking remote 的内容恢复到原位置。

    这时如果需要同步新 respository 的改动,就先 fetch,然后 subtree merge 即可。如果是希望将改动添加到新 repository 可能先需要前面的步骤(保证自己这边已经最新),然后 subtree split 到一个 branch,再 push 到新的 repository。这里反复 split 是可用的,不然前面那种直接 split 了然后 commit 似乎没法通过进一步的 split 添加到 branch 里面。

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s